MySQL, GROUP BY, Select lists, and Standard SQL

The MySQL manual says: “In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.”

I got questioned about that last week, and this week I see that there’s a proposed change in behaviour for MySQL 5.7, so I guess it’s a good time to try to figure out what’s going on in this part of the world.

I checked the manuals for most current DBMSs and confirmed that the ONLY_FULL_GROUP_BY kind of behaviour is required by DB2, Derby, Firebird, HSQLDB, Informix, Mimer, Oracle, SQL Server, and Teradata. The only exceptions are Sybase and SQLite, and I’ll mention PostgreSQL later.

But returning an error is not really absolutely always the right thing.

Suppose your table was made this way:
CREATE TABLE t (a INT PRIMARY KEY, b INT);
And suppose you make this query:
SELECT a, b FROM t GROUP BY a;
This causes an error if sql_mode = ONLY_FULL_GROUP_BY.
But is it really an error?
The trick here is: if column a is the PRIMARY KEY (or a UNIQUE NOT NULL key), and you’re grouping by a, then it is logically impossible — impossible — that there could be two different values of column b.

One could say that “a determines b”, or that there is a “functional dependency” here. Roland Bouman talked about the matter in 2007 in an article “Debunking GROUP BY myths”. He made the point that, in fact, the current SQL standard does not really absolutely always require that there be an error if there’s a functional dependency.

So let’s look a bit more closely at what the standard says now. Sample:

2) Without Feature T301, “Functional dependencies”, in conforming SQL language, if QCR is a group-invariant column reference, then QCR shall be a reference to a grouping column of the qualifying query of QCR.

Arm-waving our way past the terminology, this kind of thing means that ONLY_FULL_GROUP_BY behaviour is a requirement — for any implementation that doesn’t claim to support Feature T301. Feature T301, eh? That’s not a core feature, in other words no SQL DBMS is required to support T301 in order to claim ANSI/ISO conformance. But it is standard. So one could argue that the MySQL manual is slightly wrong, it shouldn’t suggest that “the standard” requires ONLY_FULL_GROUP_BY. On the other hand, MySQL’s behaviour without ONLY_FULL_GROUP_BY is obviously non-standard, because there’s no checking whether something is “functionally dependent” or not.

What is to be done, then?

Well, it’s rather short notice to say that something will be in 5.7. There were occasional bitter experiences when incompatible behaviour was rushed into MySQL, and I thought the lessons were learned. I was wrong. I should have listened to Mr Kipling:

As it will be in the future, it was at the birth of Man
There are only four things certain since Social Progress began.
That the Dog returns to his Vomit and the Sow returns to her Mire,
And the burnt Fool’s bandaged finger goes wabbling back to the Fire.

Sure, but shouldn’t MySQL do something about that “functional dependency” stuff?

Well, PostgreSQL did. For the sort of item that I used in the example above, which by the way is called a “BPK-dependency” (base table primary key dependency), recent versions of PostgreSQL will say okay. As a result PostgreSQL now claims “partial” compliance with feature T301. But only partial!

The difficulty is that there are so many possibilities, as soon as you start to consider what should happen with views, left or right joins, subsets of multi-column keys, rollup, subqueries, and the many ways that a functional dependency can become known, besides the simple BPK case. Trudy Pelzer and I didn’t really get to grips with these arcana in our book “SQL-99 Complete, Really”, but Jim Melton and Alan Simon did in their book “SQL:1999 Understanding Relational Language Components”, for several dense pages. I’ll just quote the last sentences:

We realize that the material in this section is complex and abstract in the extreme. We have included it in spite of the complexity to serve readers who need to understand SQL:1999’s enhancements in updatability and who are sufficiently involved with data modeling to have or pursue the background in relational theory. However, in practical terms, we have rarely found ourselves using knowledge of all of these functional dependencies when writing our applications.

I think that it will be a long time before PostgreSQL will have full support, and this will cause confusion — people who understand the rationale will expect queries to work whenever they’re rational according to functional-dependency thoughts, and they won’t work. That doesn’t mean, though, that PostgreSQL is wrong — as I’ve said before, complaints about “half-baked features” are typically wrong. I see that the Derby folks are considering following the PostgreSQL lead. And maybe MySQL someday will too. There’s a worklog task, WL#2489, Fully compliant ONLY_FULL_GROUP_BY mode. Once again Oracle has hidden the task, once again it’s visible on wayback, but WL#2489 had no high-level architecture anyway. Well, that means the plan can be flexible.

And the bottom line is: MySQL should change, but no rush. Oracle should recall Saint Augustine’s prayer: “Lord, grant me chastity and continence, but not yet.” A variation could be: Lord, let me support standard SQL by default with Feature T301 … but not in 5.7.”

, January 26, 2014. Category: MySQL.

About pgulutzan

Co-author of four computer books. Software Architect at MySQL/Sun/Oracle from 2003-2011, and at HP for a little while after that. Currently with Ocelot Computer Services Inc. in Edmonton Canada.