Standard SQL/JSON and MySQL 5.7 JSON
Support for storing and querying JSON within SQL is progressing for the ANSI/ISO SQL Standard, and for MySQL 5.7. I’ll look at what’s new, and do some comparisons.
The big picture
The standard document says
The SQL/JSON path language is a query language used by certain SQL operators (JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS, collectively known as the SQL/JSON query operators) to query JSON text.The SQL/JSON path language is not, strictly speaking, SQL, though it is embedded in these operators within SQL. Lexically and syntactically, the SQL/JSON path language adopts many features of ECMAScript, though it is neither a subset nor a superset of ECMAScript.The semantics of the SQL/JSON path language are primarily SQL semantics.
Here is a chart that shows the JSON-related data types and functions in the standard, and whether a particular DBMS has something with the same name and a similar functionality.
Standard Oracle SQL Server MySQL -------- ------ ---------- ----- Conventional data type YES YES NO JSON_VALUE function YES YES NO JSON_EXISTS function YES NO NO JSON_QUERY function YES YES NO JSON_TABLE function YES NO NO
My source re the standard is a draft copy of ISO/IEC 9075-2 SQL/Foundation. For Oracle 12c read Oracle’s whitepaper. For SQL Server 2016 read MSDN’s blog. My source re MySQL 5.7 is the MySQL manual and the latest source-code download of version 5.7.9.
Now, what is the significance of the lines in the chart?
Conventional data type
By “conventional”, I mean that in standard SQL JSON strings should be stored in one of the old familiar data types: VARCHAR, CLOB, etc. It didn’t have to be this way, and any DBMS that supports user-defined types can let users be more specific, but that’s what Oracle and Micosoft accept.
MySQL 5.7, on the other hand, has decided that JSON shall be a new data type. It’s closely related to LONGTEXT: if you say
CREATE TABLE j1 AS SELECT UPPER(CAST('{}' AS JSON)); SHOW CREATE TABLE j1;
then you get LONGTEXT. But if you use the C API to ask the data type, you get MYSQL_TYPE_JSON=245 (aside: this is not documented). And it differs because, if you try to put in non-JSON data, you get an error message.
At least, that’s the theory. It didn’t take me long to find a way to put non-JSON data in:
CREATE TABLE j2a (s1 INT, s2 JSON); INSERT INTO j2a VALUES (1,'{"a": "VALID STUFF"}'); CREATE TABLE j2b AS SELECT s1, UPPER(s2) AS s2 FROM j2a; INSERT INTO j2b VALUES (NULL, 'INVALID STUFF'); ALTER TABLE j2b MODIFY COLUMN s2 JSON;
… That unfortunately works, and now if I say “SELECT * FROM j2b;” I get an error message “The JSON binary value contains invalid data”. Probably bugs like this will disappear soon, though.
By making a new data type, MySQL has thrown away some of the advantages that come with VARCHAR or TEXT. One cannot specify a maximum size — everything is like LONGTEXT. One cannot specify a preferred character set and collation — everything is utf8mb4 and utf8mb4_bin. One cannot take advantage of all the string functions — BIN() gives meaningless results, for example. And the advantage of automatic validity checking could have been delivered with efficient constraints or triggers instead. So why have a new data type?
Well, PostgreSQL has a JSON data type. As I’ve noticed before, PostgreSQL can be a poor model if one wants to follow the standard. And it will not surprise me if the MariaDB folks also decide to make a JSON data type, because I know that they are following similar logic for an “IP address” data type.
By the way, the validity checking is fairly strict. For example, ‘{x:3}’ is considered invalid because quote marks are missing, and ‘{“x”:.2} is considered invalid because the value has no leading digit.
JSON_VALUE function
For an illustration and example it’s enough to describe the standard’s JSON_VALUE and MySQL’s JSON_EXTRACT.
The standard idea is: pass a JSON string and a JavaScript-like expression, get back an SQL value, which will generally be a scalar value. For example,
SELECT JSON_VALUE(@json, @path_string) FROM t; SELECT JSON_VALUE(json_column_name, 'lax $.c') AS c FROM t;
There are optional clauses for deciding what to do if the JSON string is invalid, or contains missing and null components. Again, the standard’s JSON_VALUE is what Oracle and Microsoft accept. There’s some similarity to what has gone before with SQL/XML.
MySQL, on the other hand, accomplishes some similar things with JSON_EXTRACT. For example,
SELECT JSON_EXTRACT(@json, @path_string); SELECT JSON_VALUE(json_column_name, '$.c') AS c FROM t;
And the result is not an ordinary MySQL scalar, it has type = JSON. In the words of physicist I.I.Rabi when confronted with a new particle, “Who ordered that?”
Well, JSON_EXTRACT and some of the other MySQL functions have fairly close analogues, in both name and functionality, with Google’s BigQuery and with SQLite. In other words, instead of the SQL standard, MySQL has ended up with something like the NoSQL No-standard.
I should stress here that MySQL is not “violating” the SQL standard with JSON_EXTRACT. It is always okay to use non-standard syntax. What’s not okay is to use standard syntax for a non-standard purpose. And here’s where I bring in the slightly dubious case of the “->” operator. In standard SQL “->”, which is called the “right arrow” operator, has only one purpose: dereferencing. In MySQL “->” has a different purpose: a shorthand for JSON_EXTRACT. Since MySQL will never support dereferencing, there will never be a conflict in practice. Nevertheless, technically, it’s a violation.
Observed Behaviour
When I tried out the JSON data type with MySQL 5.7.9, I ran into no exciting bugs, but a few features.
Consistency doesn’t apply for INSERT IGNORE and UPDATE IGNORE. For example:
CREATE TABLE t1 (date DATE, json JSON); INSERT IGNORE INTO t1 (date) VALUES ('invalid date'); INSERT IGNORE INTO t1 (json) VALUES ('{invalid json}');
The INSERT IGNORE into the date column inserts null with a warning, the INSERT IGNORE into the json column returns an error.
Some error messages might still need adjustment. For example:
CREATE TABLE ti (id INT, json JSON) PARTITION BY HASH(json);
Result: an error message = “A BLOB field is not allowed in partition function”.
Comparisons of JSON_EXTRACT results don’t work. For example:
SET @json = '{"a":"A","b":"B"}'; SELECT GREATEST( JSON_EXTRACT(@json,'$.a'), JSON_EXTRACT(@json,'$.b'));
The result is a warning “This version of MySQL doesn’t yet support ‘comparison of JSON in the LEAST and GREATEST operators'”, which is a symptom of the true problem, that JSON_EXTRACT returns a JSON value instead of a string value. The workaround is:
SET @json = '{"a":"A","b":"B"}'; SELECT GREATEST( CAST(JSON_EXTRACT(@json,'$.a') AS CHAR), CAST(JSON_EXTRACT(@json,'$.b') AS CHAR));
… which returns “B” — a three-character string, including the quote marks.
Not The End
The standard might change, and MySQL certainly will change anything that’s deemed wrong. Speaking of wrong, I might have erred too. And I certainly didn’t give justice to all the other details of MySQL 5.7 JSON.
Meanwhile
The Ocelot GUI client for MySQL and MariaDB is still version 0.8 alpha, but since the last report there have been bug fixes and improvements to the Help option. Have a look at the new manual by going to https://github.com/ocelot-inc/ocelotgui and scrolling down till you see the screenshots and the words “User Manual”.
Thanks for kicking the tires, Peter. I’ve filed MySQL bug#79432 for the problem you found with ALTER TABLE, so that it’s not forgotten.
Good article. Could you add two corrections for SQL server:
1. SQL Server do not have JSON_EXTRACT (you can use JSON_VALUE/JSON_QUERY to check does object on some path exists)
2. SQL Server has OPENJSON that is similar to JSON_TABLE. It has different name and slightly different syntax because one goal was to be aligned with SQL Server OPENXML function. However, functionalities are very similar.
Maybe you could also add one column for PotgreSQL. Although they do not follow standard, they are very popular RDBMS with JSON support.
Do you know does IBM DB2 offers some JSON support (they should have something similar to MongoDB but I’m not expert for this)?
I did not say that SQL Server has JSON_EXTRACT but I think you might have intended to say “SQL Server does not have JSON_EXISTS” — in which case, thanks, I’ll correct that. It’s true that SQL Server’s OPENJSON is like JSON_TABLE, but as I said the chart is for something “with the same name”. PostgreSQL would have a column saying NO NO NO NO NO (http://www.postgresql.org/docs/9.3/static/functions-json.html). You are correct about a DB2/MongoDB collaboration; DB2 has JSON functions now; I expect that over time they’ll move toward the standard as has happened in the past.
[…] What about MariaDB and native JSON support ? Native JSON support, indexing pure virtual column and optimizer improvements to smartly decide to use this index will come with MariaDB 10.2. These JSON features are part of MariaDB 10.2 roadmap. Here is the MariaDB 10.2 roadmap. For a more detailed features list for MariaDB 10.2 : Here is a full list of MariaDB 10.2 community voted features The MariaDB engineering team has taken it time to evaluate various aspect of native JSON implementation. Two different implementations exist. Beside Oracle’s MySQL implementation Facebook has developed its own implementation called DocStore that is part of the Facebook MySQL branch. Both have been evaluated by MariaDB engineering. Another area that must be addressed is the current evolution of the standard regarding SQL/JSON […]