Category: MySQL


MySQL, GROUP BY, Select lists, and Standard SQL

The MySQL manual says: “In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.”

I got questioned about that last week, and this week I see that there’s a proposed change in behaviour for MySQL 5.7, so I guess it’s a good time to try to figure out what’s going on in this part of the world.

I checked the manuals for most current DBMSs and confirmed that the ONLY_FULL_GROUP_BY kind of behaviour is required by DB2, Derby, Firebird, HSQLDB, Informix, Mimer, Oracle, SQL Server, and Teradata. The only exceptions are Sybase and SQLite, and I’ll mention PostgreSQL later.

But returning an error is not really absolutely always the right thing.

Suppose your table was made this way:
CREATE TABLE t (a INT PRIMARY KEY, b INT);
And suppose you make this query:
SELECT a, b FROM t GROUP BY a;
This causes an error if sql_mode = ONLY_FULL_GROUP_BY.
But is it really an error?
The trick here is: if column a is the PRIMARY KEY (or a UNIQUE NOT NULL key), and you’re grouping by a, then it is logically impossible — impossible — that there could be two different values of column b.

One could say that “a determines b”, or that there is a “functional dependency” here. Roland Bouman talked about the matter in 2007 in an article “Debunking GROUP BY myths”. He made the point that, in fact, the current SQL standard does not really absolutely always require that there be an error if there’s a functional dependency.

So let’s look a bit more closely at what the standard says now. Sample:

2) Without Feature T301, “Functional dependencies”, in conforming SQL language, if QCR is a group-invariant column reference, then QCR shall be a reference to a grouping column of the qualifying query of QCR.

Arm-waving our way past the terminology, this kind of thing means that ONLY_FULL_GROUP_BY behaviour is a requirement — for any implementation that doesn’t claim to support Feature T301. Feature T301, eh? That’s not a core feature, in other words no SQL DBMS is required to support T301 in order to claim ANSI/ISO conformance. But it is standard. So one could argue that the MySQL manual is slightly wrong, it shouldn’t suggest that “the standard” requires ONLY_FULL_GROUP_BY. On the other hand, MySQL’s behaviour without ONLY_FULL_GROUP_BY is obviously non-standard, because there’s no checking whether something is “functionally dependent” or not.

What is to be done, then?

Well, it’s rather short notice to say that something will be in 5.7. There were occasional bitter experiences when incompatible behaviour was rushed into MySQL, and I thought the lessons were learned. I was wrong. I should have listened to Mr Kipling:

As it will be in the future, it was at the birth of Man
There are only four things certain since Social Progress began.
That the Dog returns to his Vomit and the Sow returns to her Mire,
And the burnt Fool’s bandaged finger goes wabbling back to the Fire.

Sure, but shouldn’t MySQL do something about that “functional dependency” stuff?

Well, PostgreSQL did. For the sort of item that I used in the example above, which by the way is called a “BPK-dependency” (base table primary key dependency), recent versions of PostgreSQL will say okay. As a result PostgreSQL now claims “partial” compliance with feature T301. But only partial!

The difficulty is that there are so many possibilities, as soon as you start to consider what should happen with views, left or right joins, subsets of multi-column keys, rollup, subqueries, and the many ways that a functional dependency can become known, besides the simple BPK case. Trudy Pelzer and I didn’t really get to grips with these arcana in our book “SQL-99 Complete, Really”, but Jim Melton and Alan Simon did in their book “SQL:1999 Understanding Relational Language Components”, for several dense pages. I’ll just quote the last sentences:

We realize that the material in this section is complex and abstract in the extreme. We have included it in spite of the complexity to serve readers who need to understand SQL:1999’s enhancements in updatability and who are sufficiently involved with data modeling to have or pursue the background in relational theory. However, in practical terms, we have rarely found ourselves using knowledge of all of these functional dependencies when writing our applications.

I think that it will be a long time before PostgreSQL will have full support, and this will cause confusion — people who understand the rationale will expect queries to work whenever they’re rational according to functional-dependency thoughts, and they won’t work. That doesn’t mean, though, that PostgreSQL is wrong — as I’ve said before, complaints about “half-baked features” are typically wrong. I see that the Derby folks are considering following the PostgreSQL lead. And maybe MySQL someday will too. There’s a worklog task, WL#2489, Fully compliant ONLY_FULL_GROUP_BY mode. Once again Oracle has hidden the task, once again it’s visible on wayback, but WL#2489 had no high-level architecture anyway. Well, that means the plan can be flexible.

And the bottom line is: MySQL should change, but no rush. Oracle should recall Saint Augustine’s prayer: “Lord, grant me chastity and continence, but not yet.” A variation could be: Lord, let me support standard SQL by default with Feature T301 … but not in 5.7.”

Kleidophobia

The old Greek word for “key” was KLEIS so someone who fears keys would be a kleidophobe, and a xenokleidophobe specifically fears foreign keys, which brings me to the MySQL 3.23 manual

5.4.5.1 Reasons NOT to Use Foreign Keys constraints
There are so many problems with foreign key constraints that we don’t know where to start:
Foreign key constraints make life very complicated …,
The speed impact is terrible …,
There is also a need to hold locks on many more tables …,
And so on.
And so on.

Of course those fierce words quietly disappeared from the manual some time after InnoDB came out with foreign key support in 2001. But since then the attempts to get foreign key support outside of InnoDB have usually gotten stuck.

* SolidDB had it, and I made a big deal out of that while predicting a bright future for the SolidDB storage engine.
* Falcon had it, although there were bugs in the implementation (they’re marked “to be fixed later”).
* PBXT had it
* IBMDB2I had it.

All of those storage engines are, for one reason or another, gone. So, is any independent storage engine left that supports foreign keys? I know of only one: ScaleDB. ScaleDB is targeted for clusters, and I’m not sure how much traction its makers are getting in that market, but I’m always joyous when I see a company paying attention to the standard feature requirements.

Anyway, to finish our look at MySQL documentation on the matter, I’ll quote once more:

“Foreign key support generic for all storage engines is scheduled for MySQL 6.x.”

This ancient promise is still in the current manual but I predict it too will silently disappear. The promise originally was made because there was progress on a project to do a lot for foreign keys, and the detailed plan is still publicly available on the mysql.com page for Worklog Item #148 (click the “High Level Architecture” button to view the important part). Though the project got put on the shelf, the worklog document is still good for seeing what, at the time, everyone agreed needed work.

The document’s “Feature Comparison Chart” compared InnoDB feature details with the ISO standard, SQL Server, DB2, Oracle, and PostgreSQL. It’s a long list. I’ll just pick the five standard “core” items where every vendor follows the standard. Except InnoDB.

1. References Primary/Unique

InnoDB lets foreign keys refer to keys that are not primary or unique.
For example:

CREATE TABLE pk (s1 CHAR, KEY (s1), s2 INT);
CREATE TABLE fk (s1 CHAR, FOREIGN KEY (s1) REFERENCES pk(s1));
INSERT INTO pk VALUES ('a',1),('a',2);
INSERT INTO fk VALUES ('a');

So now there’s a foreign key that points to two different parent-key rows. Guess what happens if you delete only one of the referenced-key rows, thus:

DELETE FROM pk WHERE s2 = 2;

If you guessed “that’s illegal because you can’t delete a row if there’s a foreign-key referring to it”, congratulations.
If you guessed “that’s legal because the foreign key is still referencing a valid undeleted row”, congratulations.
Either answer is logical, and it doesn’t matter which one is right.
What matters is that neither logic nor precedent are of any help when InnoDB goes boldly beyond both theory and common practice.

2. Default Primary Reference

It ought to be possible to just say that the foreign key references table PK, and not have to list the columns — if PK has a primary key, the DBMS can figure it out. But, partially as a result of the previous problem, InnoDB doesn’t figure it out.

3. Statement Check

Sometimes a check for a foreign-key violation should occur when the statement is truly over, when all other changes (due to triggers or cascading updates) have finished, so that the check isn’t affected by some temporary situation that would be cleared up before the statement ended.

4. Column REFERENCES

This looks okay, and there’s no error or warning:

CREATE TABLE pk (s1 INT);
CREATE TABLE fk (s1 INT REFERENCES pk(s1));

… but, woe to the uninitiated, it doesn’t make a foreign key. You can’t make a foreign key by putting a REFRENCES clause in the column definition.

5. REFERENCES Privilege

I suppose that newcomers might also try this:

GRANT REFERENCES ON t TO joe;

and think that they’re affecting security for foreign keys. In fact they’re creating a privilege that does nothing.

The current situation is less than perfect, and I cast the blame on all the closet xenokleidophobes. You know who you are.

← Previous page