Month: April 2016
MariaDB 10.2 Window Functions
Today the first MariaDB 10.2 alpha popped up and for the first time there is support for window functions.
I’ll describe what’s been announced, what’s been expected, comparisons to other DBMSs, problems (including crashes and wrong answers), how to prepare, what you can use as a substitute while you wait.
I assume some knowledge of what window functions are. If you’d prefer an introductory tutorial, I’d suggest reading articles like this one by Joe Celko before you continue reading this post.
What’s been announced
The MariaDB sources are:
The release notes
The source code trees — the feature tree up till now has been github.com/MariaDB/server/commits/bb-10.2-mdev9543 but the version-10.2 download page has more choices and is probably more stable.
Sergei Petrunia and Vicentiu Ciorbaru, two developers who I think deserve the credit, made a set of slides for a conference in Berlin earlier this month. It seems to have some typos but is the best description I’ve seen so far.
On Wednesday April 20 Mr Petrunia will give a talk at the Percona conference. Alas, it coincides with Konstantin Osipov’s talk about Tarantool — which I’ve done some work for — which, if you somehow haven’t heard, is a NoSQL DBMS that’s stable and faster than others according to independent benchmarks like the one from Coimbra. What a shame that two such important talks are scheduled for the same time.
Anyway, it’s clear that I’ll have to update this post as more things happen.
What’s been expected
There have been several wishes / feature requests for window functions over the years.
Typical feature requests or forum queries are “Oracle-like Analytic Function RANK() / DENSE_RANK() [in 2004]”, “analytical function like RANK etc to be implemented [in 2008]”, “Is MySQL planning to implement CTE and Window functions? [in 2010]”.
Typical blog posts are Shlomi Noach’s “Three wishes for a new year [in 2012]” and Baron Schwartz’s “Features I’d like in MySQL: windowing functions [in 2013]”..
Typical articles mentioning the MySQL/MariaDB lack of window functions are “What PostgreSQL has over other open source SQL databases” and “Window Functions Comparison …”.
So it’s clear that there has been steady demand, or reason for demand, over the years.
My first applause moment is: Mr Petrunia and Mr Ciorbaru have addressed something that’s been asked for, rather than what they wished had been asked for.
Comparisons to other DBMSs
I know twelve DBMSs that support window functions. No screen is wide enough for a chart showing them all, so I’ll just list their windows-function documents here so that you can click on the names to get the details:
APACHE DRILL,
CUBRID,
DB2 LUW,
DB2 z/OS 10,
DERBY,
FIREBIRD,
INFORMIX,
ORACLE,
POSTGRESQL,
SQL SERVER,
SYBASE,
TERADATA. I’ll show MariaDB against The Big Three.
These functions are mentioned in the standard document as required by optional feature T611 ELementary OLAP operations:
Function | MariaDB | Oracle | DB2 | SQL Server |
---|---|---|---|---|
DENSE_RANK | yes | yes | yes | yes |
RANK | yes | yes | yes | yes |
ROW_NUMBER | yes | yes | yes | yes |
These functions are mentioned in the standard document as required by optional feature T612 Advanced OLAP operations:
Function | MariaDB | Oracle | DB2 | SQL Server |
---|---|---|---|---|
CUME_DIST | yes | yes | no | yes |
PERCENT_RANK | yes | yes | no | yes |
These functions are mentioned in the standard document as required by optional features T614 through T617:
Function | MariaDB | Oracle | DB2 | SQL Server |
---|---|---|---|---|
FIRST_VALUE | no | yes | yes | yes |
LAG | no | yes | yes | yes |
LAST_VALUE | no | yes | yes | yes |
LEAD | no | yes | yes | yes |
NTH_VALUE | no | yes | no | no |
NTILE | yes | yes | no | yes |
These are common functions which are in the standard and which can be window functions:
Function | MariaDB | Oracle | DB2 | SQL Server |
---|---|---|---|---|
AVG | yes | yes | yes | yes |
COUNT | yes | yes | yes | yes |
COVAR_POP/SAMP | no | yes | yes | yes |
MAX | no | yes | yes | yes |
MIN | no | yes | yes | no |
SUM | yes | yes | yes | yes |
VAR_POP/SAMP | no | yes | yes | yes |
Yes MariaDB also supports non-standard functions like BIT_XOR, but they’re worthless for comparison purposes. What’s more important is that the MariaDB functions cannot be DISTINCT.
As for the options in OVER clause … just the important ones …
Function | MariaDB | Oracle | DB2 | SQL Server |
---|---|---|---|---|
ORDER BY | yes | yes | yes | yes |
NULLS FIRST|LAST | no | “yes” | yes | no |
PARTITION BY | yes | yes | yes | yes |
PRECEDING|FOLLOWING | sometimes | yes | yes | yes |
Those are the options that matter. The NULLS clause is important only because it shows how far an implementor will go to support the standard, rather than because most people care. MariaDB in effect supports NULLS HIGH|LOW, which is as good as Oracle — The Oracle manual puts it this way: “NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.” People who think that’s not cheating can add a comment at the end of this post.
From the above I suppose this second applause moment is justifiable: MariaDB has all the basics, and half of the advanced features that other DBMSs have.
Problems (including crashes and wrong answers)
The MariaDB announcement says:
“Do not use alpha releases on production systems! … Thanks, and enjoy MariaDB!”
Indeed anyone who used 10.2.0 in production would discover that enjoyable things can be bad for you.
I started with this database: …
create table t1 (s1 int, s2 char(5)); insert into t1 values (1,'a'); insert into t1 values (null,null); insert into t1 values (1,null); insert into t1 values (null,'a'); insert into t1 values (2,'b'); insert into t1 values (-1,'');
The following statements all cause the server to crash:
select row_number() over (); select 1 as a, row_number() over (order by a) from dual; select *, abs(row_number() over (order by s1)) - row_number() over (order by s1) as X from t1; select rank() over (order by avg(s1)) from t1;
The following statements all give the wrong answers:
select count(*) over (order by s2) from t1 where s2 is null; select *,dense_rank() over (order by s2 desc), dense_rank() over (order by s2) from t1; select *, sum(s1) over (order by s1) from t1 order by s1; select avg(s1), rank() over (order by s1) from t1;
The following statement causes the client to hang (it loops in mysql_store_result, I think this is the first time I’ve seen this type of error)
select *, avg(s1) over () from t1;
And now for the third applause line … to which you might be saying: huh? Aren’t those, er, less-than-desirable results? To which I would reply: yes, but two weeks ago there were far more and far bigger problems. We should be clapping for how quickly progress has been made, and guessing that this section of my post will be obsolete soon.
How to prepare
You have lots of time to get ready for 10.2, but may as well start now by getting rid of words that have special meaning for window functions.
The word OVER is reserved.
The newly supported function names — DENSE_RANK RANK ROW_NUMBER CUME_DIST PERCENT_RANK NTILE — are not reserved, and the names of functions which will probably be supported soon — FIRST_VALUE LAG LEAD LAST_VALUE NTH_VALUE — will probably not be reserved. But they might as well be, because you won’t be able to use those names for your own functions. Besides, they’re reserved in standard SQL.
What you can use as a substitute
Suppose you don’t want to wait till MariaDB is perfect, or you’d like to stay with MySQL (which as far as I know has made less progress than MariaDB toward this feature). Well, in short: gee that’s too bad. But I have seen three claims about getting a slight subset.
One: Shlomi Noach claimss you can use a trick with GROUP_CONCAT:
Two: Adrian Corston claims you can make delta functions with assignments.
Three: I claim that in ocelotgui you can put ‘row_number() over ()’ in a SELECT and get a row-number column even with older versions of MySQL or MariaDB (this is a recent change, it’s in the source not the binary).
In fact all the “window_function_name() OVER ()” functions could be done easily in the client, if they’re in the select list and not part of an expression,
and the result set is ordered. But I’m not sure whether that’s something to excite the populace.
There might be a “Four:”. I have not surveyed the various applications that can do cumulations. I suspect that mondrian is one, and open OLAP might be another, but haven’t looked at them.
Our own progress
For ocelotgui (Ocelot’s GUI client for MySQL and MariaDB) we had to adjust the syntax checker to highlight the new syntax in 10.2, as this screenshot shows
So we now can claim to have “the only native-Linux GUI that correctly recognizes MariaDB 10.2 window functions”. Catchy slogan, eh? The beta download is at https://github.com/ocelot-inc/ocelotgui. I still expect that it will be out of beta in a few weeks.