The First Development Milestone for MySQL 8.0

MySQL 8.0.0 exists.

For general impressions we already have comments by Giuseppe Maxia and Stewart Smith and Serdar Yegulalp.

Two new features looked important to me: modern UCA collations, and roles. So I downloaded and tried them out.

Modern UCA collations

MySQL is going to switch to utf8mb4 for the default character set and add collations based on the the latest version of the Unicode Collation Algorithm (UCA 9.0.0). I still see messages indicating the default is still latin1, but they're incorrect, I can put 4-byte UTF-8 characters in columns that I created without explicitly saying utf8mb4.

The new collations are only for utf8mb4. That's suboptimal. People still have good reasons to use other character sets (I discussed some of them in an earlier blog post). And in any case, a collation that works for utf8mb4's repertoire will work for every character set that has a pure subset of that repertoire, which is to say, every character set.

The new collations are only for the generic "Default Unicode Collation Element Table" (DUCET), and for Latin-based alphabets. So there are no updates for "persian" or "sinhala".

For an example, the following table shows changes between the old Swedish collation (utf8mb4_swedish_ci) and the new one (utf8mb4_sv_0900_ai_ci). The "Rule" column has what Unicode says about certain Swedish primary (level-1) comparisons, the "Example" column has what an SQL comparison would look like, the "Old" column has the results I got with utf8mb4_swedish_ci, the "New" column has the results I got with utf8mb4_sv_0900_ai_ci.


Rule Example Old New
---------------------------- --------- ----- ----
ETH = D 'Ð' = 'D' FALSE TRUE
D STROKE = D 'Đ' = 'D' FALSE TRUE
THORN = TH 'Þ' = 'TH' FALSE TRUE
O DOUBLE ACUTE = O DIAERESIS 'Ő' = 'Ö' FALSE TRUE
U DOUBLE ACUTE = Y 'Ű' = 'Y' FALSE TRUE
L STROKE = L 'Ł' = 'L' FALSE TRUE
A DIAERESIS = E OGONEK 'Ä' = 'Ę' FALSE TRUE
OE = O DIAERESIS 'Œ' = 'Ö' FALSE TRUE
O CIRCUMFLEX = O DIAERESIS 'Ô' = 'Ö' FALSE TRUE

Most Swedes don't know about these rules, they apply to medieval texts or foreign names. But most Swedes do know that rules should cover the edge cases, not just the Stockholm phone book. Because it follows the Unicode rules, the new collation is better.

But the new collation's name is worse, for two reasons.

(1) The "_ai" suffix, meaning "accent insensitive", is Microsoftish. There is such a thing, but the definition of "accent" varies between languages and the factors that influence collations can be other things besides accents. Clearer suffixes for extra-sensitive collation names would be "w2" or "l2" (for weight=2 or level=2), and they're for sorting rather than searching unless you're Japanese, but a default = no-suffix-for-accents would have been okay.

(2) The "_sv" suffix, meaning "Swedish", is an unnecessary change. Compatibility with the previous suffix -- "swedish" -- would not have violated UCA specifications and would have been clearer for people who have used MySQL before.

For a second example, I looked at the new "Latin" collation, utf8mb4_la_0900_ai_ci. This time I couldn't find any rules file in the Unicode standard directory. There is a UCA chart for Latin but utf8mb4_la_0900_ai_ci obviously isn't following it at all. Instead it's like MySQL's old and silly "Roman" collation, where i=j and u=v. This is not an important collation. But MySQL claims the new collations follow UCA rules, and here is one that doesn't, so I worry about the others.

This has to be taken in context -- MySQL has far better support for character sets and collations than any other open-source DBMS, except sometimes MariaDB. And now it's a weenie bit more far better. Observations about paucity of new UCA collations, bad names, or standard non-compliance won't change that fact.

Roles

I discussed MariaDB's roles in 2014. MySQL's roles are already in the 8.0 documentation. Is MySQL making an improvement?

The first thing I noticed is that the syntax rules for roles are, too often, the same as the syntax rules for users. This is especially obvious when I ask for things that make no sense for roles, for example:

mysql>CREATE ROLE 'r'@'host';
OK 0 rows affected (0.1 seconds)

mysql>CREATE ROLE '';
Error 1396 (HY000) Operation CREATE USER failed for anonymous user

mysql>GRANT 'root'@'localhost' TO role_name;
OK 0 rows affected (0.1 seconds)

mysql>DROP USER role_name;
OK 0 rows affected (0.1 seconds)

Because of this, some non-standard limitations exist: maximum name length is 32, names are case sensitive, role names cannot be the same as user names, and there is no separate information_schema table

However, the DML statements that I tested for MariaDB do work with MySQL as well, and are often exactly the same:

MariaDB: CREATE [OR REPLACE] ROLE [IF NOT EXISTS] role_name [WITH ADMIN ...];
MySQL:   CREATE ROLE [IF NOT EXISTS] role_name [,role_name...];

MariaDB: DROP ROLE [IF EXISTS] role_name [,role_name...];
MySQL:   DROP ROLE [IF EXISTS] role_name [,role_name...];

MariaDB: SET DEFAULT ROLE {role_name|NONE} [FOR user_name];
MySQL:   SET DEFAULT ROLE ALL TO user_name [,user_name...];

MariaDB: SET ROLE {role_name|NONE};
MySQL:   SET ROLE {role_name|NONE};

MariaDB: SELECT CURRENT_ROLE() | CURRENT_ROLE;
MySQL:   SELECT CURRENT_ROLE();

MariaDB: [no exact equivalent]
MySQL:   GRANT CREATE ROLE ON *.* TO grantee;

MariaDB: SHOW GRANTS [FOR role_name];
MySQL:   SHOW GRANTS [FOR role_name];
MySQL:   SHOW GRANTS [FOR user_name USING role_name[,role_name...]];

MariaDB: GRANT role_name TO grantee [,grantee...] [WITH ADMIN OPTION];
MySQL:   GRANT role_name[,role_name...] TO grantee [,grantee...];

(The last GRANT example surprised me. MariaDB has trouble granting multiple roles in one statement, it's Bug#5772. MySQL appears to be "solving" it by making certain role names illegal unless they're delimited; I'm not sure that's the right way to solve it.)

Circular roles (GRANT r1 TO r2; GRANT r2 TO r1;) are allowed but I expect they'll be disallowed in a later version.

Example:

/* as a user with lots of privileges */
CREATE USER 'u'@'localhost';
CREATE ROLE r;
CREATE TABLE t1 (s1 INT);
CREATE TABLE t2 (s1 INT);
GRANT SELECT ON t1 TO r;
GRANT r TO 'u'@'localhost';
/* as user 'u'@'localhost' */
SET ROLE r;
SELECT * FROM t1;
SELECT * FROM t2;
/* The first SELECT succeeds, the second SELECT fails. */

To generalize: so far MySQL 8.0.0 allows creation of roles but they have to look like users. So the syntax is undesirable, but they work properly.

Again, remember the context. There's nothing wrong with a feature that's not ready, until MySQL declares that it's ready.

Typos

MySQL's announcement, buried in a section about minor fixes, says "Foreign key names as stored in the foreign_keys and foreign_key_column_usage tables are a maximum of 64 characters, per the SQL standard". Er, up to a point. The SQL standard says "In a regular identifier, the number of identifier parts shall be less than 128."

Us Too

We have a new-version announcement too. Version 1.0.3 of the Ocelot Graphical User Interface (ocelotgui) for MySQL and MariaDB came out on Tuesday September 27 2016. Some new items are ...

As well as getting result sets in the result-set widget, one can get them added to the history widget, with the same format as what the mysql client outputs.
shot_2016_0929_4

As well as predicting what the next word should be, Ocelot's syntax recognizer makes it possible to show hints if the user hovers over a word.
shot_2016_0929_3

Finally, there is a rudimentary formatter. Clicking the edit menu item Edit|Format will change indentation, make keywords upper case, etc. I say "rudimentary" because, without a standard to follow, one must depend on taste, and nobody shares the taste that's on display here.
shot_20160929_1shot_2016_0929_2

Documentation is now on ocelot.ca/index.htm. C++ source and Linux-ready packages are on github.com/ocelot-inc/ocelotgui.

, September 29, 2016. Category: MySQL.

About pgulutzan

Co-author of four computer books. Software Architect at MySQL/Sun/Oracle from 2003-2011, and at HP for a little while after that. Currently with Ocelot Computer Services Inc. in Edmonton Canada.

2 Comments

  1. Kristofer says:

    Interesting remarks on collation! Something to think about.

    "But most Swedes do know that rules should cover the edge cases, not just the Stockholm phone book"=>
    You shouldn't underestimate the influence of the Stockholm phone book. As a country boy I've surely experienced what access to big city directories can do. :)

    "Circular roles (GRANT r1 TO r2; GRANT r2 TO r1;) are allowed but I expect they'll be disallowed in a later version." =>

    Role hierarchies are resolved so that they form a tree using a breadth first strategy (see class Get_access_maps).
    Since an authorization ID can be either a USER or a ROLE or /both/, there can be cases where these seemingly circular references make sense. It depends on your business process basically.
    Consider authid feature_a(gen_feature_dev(contracted_dev))

    GRANT TO feature_a;
    GRANT feature_a TO gen_feature_dev; // 'circular' reference

    Now consider authid patch_team (gen_feature_dev(feature_a, contracted_developer))

    The SELECT roles_graphml() function is a nifty way to quickly get a graphical overview of the security hierarchies.

  2. Bernt Marius Johnsen says:

    Hi, Some clarifications on collations in MySQL 8.0.0:

    1) Sinhala was dropped from 8.0.0 and will be implemented together
    with a more general and improved implementation of collations for Indic
    scripts.

    2) Persian was dropped from 8.0.0 and will be implemented together
    with a more general and improved implementation of collations for Arabic
    scripts.

    4) We have chosen to use ISO-639 language codes instead of English
    names on the collations because we consider that more precise.

    3) utf8mb4_la_0900_ai_ci is, as you say, the "old and silly "Roman"
    collation" and is kept for legacy. The classical Latin language have
    no entry in CLDR. The table you point to, however,
    (http://unicode.org/charts/collation/) is not for the Classical Latin
    language but defines the basic order of the Latin *letters*. So all
    the new collations is based on UCA/CLDR with the exception of
    utf8mb4_la_0900_ai_ci. You may consider that a documentation bug.

    Bernt