Version 2.1

Ocelotgui, the Ocelot Graphical User Interface, version 2.1, is now available for download. The principal new feature is the chart modifications described in the last post, Charts. Packages are for Linux or Windows, compatible DBMSs are MySQL or MariaDB or Tarantool. Get description and instructions and clone source from GitHub Code area, or get the release from GitHub Releases area.


SELECT 0,1,2,3,4,5,6,7,8; result can look like this

but should it?

I’ll try to cover here what would be an obvious way to show number columns in charts, or if not obvious then at least simple, or if not obvious or simple then at least understandable, or if none of the above, well, good looking.

The initial part of this description will look a bit like what you see with DB2 magic commands for bar or line or pie charts, but I like to think the later part of this description shows something more sophisticated.


Start with the most obvious: any group of numbers can be shown as a group of bars.

Should they be horizontal by default? That appears to be the less common choice, A Complete Guide to Bar Charts says “While the vertical bar chart is usually the default …”, ChartJs says “The default for this property is ‘x’ and thus will show vertical bars.”

But a SAS blog 3 reasons to prefer a horizontal bar chart persuaded me because, when we’re a GUI displaying an SQL result set, we usually have more room on the x (width) axis than on the y (height) axis.

Now, as for simpleness: that should mean there’s no extra work or time for exporting, manipulating in a non-SQL environment, learning Python, or even having to think — if the total time including user time is more than a second, that’s a fail. Therefore it has to be possible to get
SELECT 1,2,3;
which ordinarily displays as

to display as

with a shortcut that includes a mnemonic. Therefore hitting Alt+Shift+B should switch immediately to the bar-chart display.

But it can get more complex, and that means users must enter instructions, and the only way to make that simpler is to provide instruction syntax that looks like SQL, thus:

SET ocelot_grid_chart = 'bar vertical';

… since users will know that SET value = literal is the SQL way to assign, and will know that ''s enclose literals, and will know what vertical means. So this is the result and it won’t be surprising:

Another bar-chart variant is stacking, that is, putting 3 values in one bar instead of 3 side-by-side bars. For example

SET ocelot_grid_chart = 'bar stacked';


SET ocelot_grid_chart = 'bar vertical stacked';


Lines are a bit less obvious than bars, because if there’s only one number in the series, they’re meaningless. Nevertheless, there should be a simple way to show a chart. Therefore Alt+Shift+L will cause


Maybe, although I suppose everyone has seen this sort of thing before, I should make some terminology notes around now …
X-axis: usually this is the horizontal, with the word “samples”
Y-axis: usually this is the vertical, with the word “values”
Legend: in the upper right corner, with a colour icon and a short indicator
Ticks: the tiny marks along the values, so users can see what the scale is.

For the French terms, see Statistique Canada.

But if users don’t know the terms, that’s fine, they understand the picture anyway.


Pies are obvious and simple too, all they require is
Alt+Shift+P or

SET ocelot_grid_chart='pie';

and SELECT 1,2,3; will look like this:

Fine. But putting numbers along the x-axis and y-axis wouldn’t be informative, and nobody has figured out what should happen with negative numbers. I think Microsoft Excel would take the absolute value, I think it makes more sense to ignore, but the point is: no solution is intuitive for all, therefore there’s a gotcha, therefore it’s not obvious and simple after all.


Every illustration so far has been simple, a 3-number series. There’s no complicated decision-making, the algorithm is just “if it’s a number, i.e. any of the data types that the DBMS fits in a general numeric category, it’s in, otherwise it’s out”.
It would follow, then, that

>SELECT 1,2,3,'',4,5,6,'',7,8,9;

would result in this if the setting was for bar charts:

There are three “groups” here, and “groups” is a better word than “series” because GROUP is an SQL-standard term.


Now things get a bit more complex. What makes us decide that every individual numeric-column occurrence should be part of a separate bar or line or pie segment? Answer: nothing does, that’s simple but not really the best for representing some of the typical chart situations.

Take the possibility that all the columns which have something in common are combined. The commonness could be the value. For example

SET ocelot_grid_chart='bar vertical subgroup by value % 3';

SELECT 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15;

That is, the values which are divisible by 3 are together, but distributed across the landscape. Each of the things that are together are usually called “samples” if one uses chart terminology, and that’s why in this illustration the word “samples” appears at the bottom, and each subgroup has a separate colour, which is echoed by the legend.

More usefully, I think, the commonness could be the column_name prefix. The usefulness lies in the fact that users know how to put AS clauses after column names in the select list, so they can determine what the subgroups are. For example

SET ocelot_grid_chart='bar vertical subgroup by left(column_name, 1)';

SELECT 1 AS x_1, 2 AS y_1, 3 AS z_1, 4 AS x_2, 6 AS y_2, 7 AS z_2;

There might be other criteria for deciding how to subgroup, but they’re probably less important. What’s important is that once again the syntax is something any SQL user will understand.

Actually the word that chart-lovers prefer here is “categories”, and “subgroups” isn’t really a word you’ll find in typical SQL, but at the moment I’m thinking it’s “SQL-ish”.


Since this is SQL, NULLs exist.

There’s no such thing as obvious when NULL might mean unknown or inapplicable, but this isn’t Oracle so it would be too trite to say NULLs are zeros, at least when it’s possible to hint otherwise. Thus

SET ocelot_grid_chart='bar vertical subgroup by left(column_name, 1)';

SELECT 1 AS m, 2 AS m, 3 AS m, 4 AS o, 5 AS o, NULL AS o;

doesn’t use a solid colour, but has the same size as whatever the largest bar is.

I worry that by now we’ve evolved from “obvious” to merely “understandable”, so I’ll explain this example more deeply. In this example there are three samples of ‘m’ and ‘o’. The first sample, label 0, has the first ‘m’ and it has value = 1, the black bar, and it has the first ‘o’ and it has value = 4, the red bar. Sample 1 has the second ‘m’ and the second ‘o’. Sample 2 has the third ‘m’ and ‘o’, and, since the third ‘o’ is NULL, it is not solid. The legend in the right corner is helpful for showing that
black is ‘m’ and red is ‘o’.

WHERE clauses

But surely not every numeric column should always be in a chart? I think of column names that end with ‘_id’, or the possibility that one will want to display both the chart and the numbers as text.

The SQL syntax for deciding what’s in a group could be CASE … WHEN, or IF … THEN, but what even the novice SQL user has seen is WHERE. The condition could be COLUMN_NAME relative-operator value, COLUMN_NUMBER relative-operator value, COLUMN_TYPE relative-operator value. By the way, since this is a GUI, users will see hints for autocompletion when they type WHERE, for example

and after

SET ocelot_grid_chart='bar vertical stacked' WHERE column_type = 'integer';

SELECT 1,1.1,2,7,-44;

the result will not include 1.1 in a chart because it is not an integer.

And this is where it becomes possible at last to show the statements that caused the example at the start of this post, where some of the columns were in a bar chart, some in a line chart, and some in a pie.

SET ocelot_grid_chart = 'line' WHERE column_number <= 3;
SET ocelot_grid_chart = 'bar' WHERE column_number >= 4 AND column_number <= 6;
SET ocelot_grid_chart = 'pie' WHERE column_number > 6;
SET ocelot_grid_chart = '' WHERE column_number > 100;
SELECT 0,1,2,3,4,5,6,7,8;

and we’ve arrived at the example we started with.

Turning it off

Alt+Shift+N or

SET ocelot_grid_chart='';
will erase all earlier chart statements. So there’s a simple way to go back to the regular way to display results.

Source Code

My main purpose describing all this is to show what the design of a GUI should look like, if my observations about goals make sense.

The illustrations show what the implementation looks like as of today, they’re screenshots of ocelotgui, and they work as I’ve described. But currently they’re only distributed in source code. Anyone who wants to reproduce them today can download the source from github, follow the special build instructions in file chart_options.txt, connect to MySQL or MariaDB or Tarantool, and see the same things. You will notice that you need to use the Qwt library, which should be no problem (ocelotgui is GPLv2, Qwt is LGPL, Qwt is on many distros). Pay no attention to the bug in the room.

But I’m not sure whether to call this the final look for the next executable release. Let me know in the comments if I’m missing tricks.

SQL:2023 and how well MySQL and MariaDB comply

SQL:2023 is now the official SQL standard. I’ll compare some of the new optional features with what MySQL and MariaDB support already, and say whether I judge they are compliant.

At the moment I don’t have draft documents (my membership on the standard committee ended long ago), so I gratefully get my news from Oracle and Peter Eisentraut. It’s early days, I may be making mistakes and may be guessing wrong about what new features are important.


MySQL and MariaDB examples:

 SELECT LPAD('hi',3,'?'); causes '?hi'.
 SELECT RPAD('hi',3,'?'); causes 'hi?'.

Judgment: compliant with SQL:2023 optional feature T055.

The third argument is optional and default is the space character. For some reason I’m not seeing the documentation in the manuals, but it is standard behaviour.


MySQL and MariaDB examples:

  SELECT LTRIM(' *'); -- causes '*'.
  SELECT RTRIM('* '); -- causes '*'.

Judgment: not compliant with SQL:2023 optional feature T056.

Although SQL:2023 now has LTRIM and RTRIM, they are two-argument,
perhaps a bit like Oracle.

Also SQL:2023 has BTRIM for trimming from both ends. Although MySQL and MariaDB have TRIM(BOTH …), it’s not quite the same thing.

VARCHAR instead of VARCHAR(length)

MySQL and MariaDB example:

  CREATE TABLE t(s1 VARCHAR(1)); -- (length) is mandatory

Judgment: not compliant with SQL:2023 optional feature T081

A DBMS that supports feature T081 doesn’t have to include a maximum length, it’s up to the implementation, which in this situation could be 65535 bytes. This is unfortunate and it might be my fault, I think I used to insist that length should be mandatory in MySQL.

Although I can use the TEXT data type instead for indefinite-length strings, it has different characteristics.

ANY_VALUE, emphasizing use with GROUP BY

MySQL and MariaDB example:

 SELECT s1, s2 FROM t GROUP BY s1;

Judgment: compliant with SQL:2023 optional feature T626, almost, alas

Probably you noticed that s2 is not in the GROUP BY clause, and yet the syntax is legal — any of the original t values could appear.

The standard would yield something similar with different syntax: SELECT s1, ANY_VALUE(s2) FROM t GROUP BY s1;

Or, as Oracle describes it, “[ANY_VALUE] eliminates the necessity to specify every column as part of the GROUP BY clause”.

That means it’s non-deterministic. Usually I’ve cared more about non-standardness. But now that argument is weaker.

ORDER BY, emphasizing use with GROUP BY

MySQL and MariaDB example:

  CREATE TABLE t (s1 int, s2 int, UNIQUE (s2));
  INSERT INTO t VALUES (1,1),(7,2),(5,3);

Judgment: not compliant with SQL:2023 optional feature F868.

I’m probably misunderstanding something here about the intent, but the effect is that something that’s not in the GROUP BY clause can determine the ordering. In this case, because column s3 is unique, it’s maybe not utterly absurd. But in MySQL and MariaDB, the result is not guaranteed to be in order by column s3.


MySQL and MariaDB syntax:

  SELECT GREATEST(2,1,3); -- result = 3
  SELECT LEAST(2,1,3); -- result = 1

Judgment: compliant with SQL:2023 optional feature T054


MySQL and MariaDB example:


Judgment: partly compliant with SQL:2023 feature F292 unique null treatment

To understand this, your required reading is my old blog post
You have to read the whole thing to understand that the
standard document was not contradictory, and that MySQL
and MariaDB have always been compliant, because …
In a UNIQUE column, you can insert NULLs multiple times.

However, because lots of people never understood, there are implementations where you cannot do so. Therefore feature F292 allows new syntax:


MySQL and MariaDB could easily half-comply by allowing the first option and saying it is the default. (The standard now says the default is implementation-defined.) (A backward step.)

Hexadecimal and octal and binary integer literals

MySQL and MariaDB example:

  SELECT 0x4141;

Judgment: not compliant with SQL:2023 optional feature T661

Did you notice that I said “integer literals”? That’s what’s wrong, 0x4141 in standard SQL is now supposed to be an integer, decimal 16705. But MySQL and MariaDB will treat it as a string, ‘AA’.

It is always legal for an implementation to have “non-conforming” syntax that causes “implementation-dependent” results. That’s what happened with 0x4141, in the mists of time. But now, if one decides to “conform” with the syntax of this optional feature, one must change the behaviour drastically: not merely a different value but a different data type.

I suppose that the road to compliance would involve deprecations, warnings, sql_mode changes, etc., lasting years.

Underscore digit separators

MySQL and MariaDB example:

  CREATE TABLE t (10_000_000 INT);
  SELECT 10_000_000;

Judgment: not compliant with SQL:2023 optional feature T662 Underscores in numeric literals

Big numbers like 10,000,000 cannot be expressed in SQL using commas because that looks like three separate values. The solution, which I see was implemented a few years ago in Python, is to express 10,000,000 as 10_000_000 — that is, underscores instead of commas.

In standard SQL, an identifier cannot begin with a digit. But in MySQL and MariaDB, it can, so 10_000_000 is a legal regular identifier. This is another example of an extension that causes trouble in a long term.

There is an idiom about this, usually about chickens: “come home to roost”. As Cambridge Dictionary puts it: “If a past action, mistake, etc. comes home to roost, it causes problems at a later date, especially when this is expected or seems deserved …”


MariaDB example:


Judgment: partial compliance with SQL:2023 feature T801 JSON data type etc.

I’m not going to try to analyze all the new JSON options because it’s a big subject. Also there’s a CYCLE clause and a new section of the standard, part 16 Property Graph Queries (SQL/PGQ), which I don’t think is appropriate to get into during this comparison of features in the foundation. Maybe some other time.


Actually, despite pointing to some non-compliance, I have to give credit to the people who decided to implement some features long before they became standard. (And I don’t credit myself at all because they appeared before I was a MySQL architect, and in fact I objected to some of them while I was with MySQL AB.)

Well done.

ocelotgui news

The new version of the Ocelot GUI application, ocelotgui, is 2.0. As always the source and executable files are on github. The biggest new feature is charts. I’ll say more in a later blog post.

Update added 2023-06-12: Re ANY_VALUE: see comment by Øystein Grøvlen. As for the question from Federico Razzoli, I cannot give an authoritative answer. Also I’ve been reminded that MariaDB request MDEV-10426 regarding ANY_VALUE is marked as part of MariaDB 11.2 flow, but until then I guess MySQL is a bit ahead.

Delimiters in MySQL and MariaDB

In the SQL standard delimiters are for separating tokens; in MySQL/MariaDB they’re for ending statements, particularly compound statements as in

  VALUES (5);

The problem is that a dumb client could stop at the first semicolon “;” in the statement (the one after VALUES (5)) instead of the one that truly ends (the one after END). The solution is to say that semicolon is no longer the default delimiter, and the way to specify a replacement is with a non-standard DELIMITER statement, such as


and after that the true statement end is //.

The idea isn’t unusual — in PostgreSQL’s psql client you can enclose compound statements with $$ which they call dollar quoting, with Microsoft you get used to saying “go” when you want something shipped to the server, with Oracle you might sometimes see SQLTERMINATOR. But it shouldn’t always be necessary — despite the fact that words like BEGIN are not reserved, a client with even a slight ability to recognize syntax can see where a compound statement ends without needing special tags.


Some DELIMITER quirks are documented in either the MySQL manual or MariaDB manual, some are common to all mysql-client commands, but all are weird. They are:
() DELIMITER itself is not delimited
() After DELIMITER A B the delimiter is A
() After DELIMITER A; the delimiter is A;
() After DELIMITER 12345678901234567890 the delimiter is 123456789012345
() After DELIMITER ‘x’ or DELIMITER “x” or DELIMITER `x` the delimiter is x
() After DELIMITER ;; the delimiter is ;;
() Delimiters inside comments or character string literals or delimited identifiers are ignored
() After \dA the delimiter is A (\d is a substitute for DELIMITER and no space is necessary)
() After DELIMITER \ the effect can be odd, the MySQL manual recommends not to use \
() SELECT 5; DELIMITER ; will cause an error message
() DELIMITER ;; followed by SELECT 5;;; has a different effect from DELIMITER // followed by SELECT 5;//
() \G and \g will work whether or not there is a delimiter

It is not a gotcha that, when statements are in a .sql file and read by the SOURCE file-name statement, the rules are different: Compound statements will be recognized without special delimiters, and semicolons will always end non-compound statements even after a different delimiter has been specified.

It is not a quirk or a gotcha, it’s merely bad design, that DELIMITER forces the fundamental requirement of a parser — the tokenizer — to change what the tokenizing rules are while the tokenizing is going on.

Best practice

Both the MySQL and the MariaDB manual have example statements where the delimiter is // and that’s unlikely to cause confusion, since the only other uses of / are as a beginning of a bracketed comment or as a division operator, or in arguments to PROMPT or SOURCE or TEE.
Less frequently there is a recommendation for $$ but that is bad because $$ can occur inside regular identifiers.
DELIMITER !! is equally bad because !, unlike /, is a repeatable operator, for example SELECT !!5; returns 1.
DELIMITER ;; is worse and has caused bugs in the past, but the mysqldump utility will generate it before CREATE TRIGGER statements.

Avoid non-printable or non-ASCII characters, some clients will support them but others won’t.

You could set a delimiter at the start of a session and let it persist. But I believe the more common practice is to put DELIMITER [something appropriate] immediately before any CREATE PROCEDURE or CREATE TRIGGER or CREATE FUNCTION or BEGIN NOT ATOMIC statement, and DELIMITER ; immediately after. This might help readability by humans since otherwise they’d look at statements that end with mere semicolons and assume they’d go to the server immediately, not realizing that long ago there was a DELIMITER statement.

See also the Descriptive SQL style guide section “Semicolons”.


The ocelotgui client doesn’t need DELIMITER statements because obviously a client that includes a stored-procedure debugger can recognize compound statements, but it will accept them because it tries to be compatible with the mysql client. It does not duplicate mysql client behaviour for the SOURCE statement but that should be fixed in the next version, and the fix is now in the source code which can be downloaded from github.

Draw Database Diagrams

It’s pleasant to admire drawings of database tables and their foreign-key relationships, which are “data structure diagrams” or more grandiosely “entity relationship (ER) diagrams”. I’ll show what the esthetic considerations are i.e. “does this look nice”, and what the engineering considerations are i.e. “what do I have to do to make it at all”. I’ll begin with the conclusion — here is a drawing of the well-known sakila database.
To find diagrams that other products draw for the same database, just google “sakila er diagram”. You’ll see something like this:
and what I’m trying to address first is: who’s the fairest?


Esthetics, also spelled aesthetics, is “the formal study of the principles of art and beauty”. And there really have been formal studies where academics look for criteria that people judge are important. A fine description is in a Radboud University thesis by Bart van de Put, “Scoring Entity-Relationship Diagrams Drawn by a Computer Algorithm”. I will consider five criteria that the thesis mentions: Bends when a line makes a sharp direction change. Crossings when two lines intersect or nearly intersect.
Edge non-orthogonality when lines are not all horizontal or vertical with right angles.
Node non-orthogonality when tables are not in a fixed rank-and-file grid. Inconsistent line lengths when some of a table’s relationships are short and others are long.

Although a computer program may not be great as a beauty contest judge, it can simply count bends and crossings, assign numeric weights to them, and come up with a negative score. Then picking for display a drawing with the best score can be achieved with brute-force comparisons or with functions that are slightly more sophisticated but enormously harder to write.


Unlike what happens in most of the google-result diagrams I mentioned above, I just declare “there are no bends”. All lines are straight, with many more angles than 90-degree up/down/left/right.
For example this is a table that had foreign-key relationships with 8 other tables. Bend count: zero.

Let’s admit the downsides too:
(a) The original idea of structure diagrams was that lines could be done with as few as four ASCII characters – | > < and so were easy to reproduce. Drawing angles is perhaps technically possible with ASCII art but would need more screen space.
(b) In the situations where the classic style would bend around, the straight-line style will draw right through a table. This can be illustrated with a sakila diagram that has all the columns, and the line between rental and inventory goes right over the staff table.
There’s an option to put the line in the background instead
but either way, there’s an esthetic problem which should be weighted the same as a count of bends, or higher.
(c) we can’t put labels on any non-horizontal lines, so the only way to know the foreign-key constraint is to hover over the line and wait for a tooltip to appear.


Unlike the monochrome google-result diagrams I mentioned above, I just declare “different lines have different colours”. This has limited possibilities since the palette has to consist of only colours that are easily distinguishable and are not similar to the background colour, which would have a camouflage effect. So the practical limit is about a dozen colours. However, usually, when two lines are close or intersecting it’s easy to distinguish them.

Let’s admit the downside here too:
(a) this doesn’t reduce the number of crossings, it just patches over the problem
(b) it’s a matter of chance whether a bunch of colours is esthetic or not.

Edge non-orthogonality

The algorithm produces straight lines and almost no right angles so it is almost always edge non-orthogonal. As the Radboud University thesis says, edge non-orthogonality is considered a negative. But the thesis also admits that there are unnecessary bends otherwise. I’m giving bends more weight. I see in other products that edge orthogonality has more weight.

Node non-orthogonality

The algorithm places high weight on node orthogonality and one result is that tables are in fixed ranks and files, always. Again, this is unlike what I see in other products. I suppose that their criterion is compactness: if there are fewer spaces between tables, then there is less scrolling necessary to see the whole picture. So this is another case of a beautification with a downside, but I insist on it because of another consideration: specifiability.

Users will sometimes want to specify, instead of “ERDIAGRAM OF whole_schema”, “ERDIAGRAM OF schema TABLES (table-list)”. Once we allow that much, it’s reasonably simple to allow for users to say which tables should go in which fixed positions. For example, consider this specification
SHOW ERDIAGRAM OF sakila TABLES (staff 0 1, inventory 1 0, rental 2 1)
which results in this
Users can figure out graphic coordinates such as 0 1 “position 0 on the X axis position 1 on the Y axis”. They would be , however, if the coordinates had to be pixel positions instead. Having straight ranks and files makes it possible to specify with small integers.

By the way, the automatic production of drawings based on databases is sometimes called “reverse engineering”. So I suppose the appropriate term for specifying a diagram like this is “forward engineering”, but the real engineering job is of course the CREATE TABLE statements.


Suppose we have three tables A B C, and A references C, well then we wouldn’t want to draw B between A and C, we’d want the diagram to look like A C B. In other words, if we could just get related tables to be drawn close to each other with no intervening tables, we’d have no bends. For a tiny number of tables this is easy to see and to fix. But as the number of tables grows the possible number of possible table positions in the drawing grows geometrically, like a travelling salesman problem. So positioning is far more difficult than bend denialism or crossings anti-camouflage.

The Radboud University thesis mentions a tried-and-mathematically-solid way to get the fewest bends with a large number of related tables. I saw it too late, though, so the method I implemented might be inferior. Also I might have saved time by bringing in the Open Graph Drawing Algorithm library, but didn’t.

The method is like crystal formation.

Start by drawing the table A that has the most relationships, either referencing-from or referenced-to. Then, for each table that A is related to, draw nearby (as closely as possible without interfering with other tables) the related tables … starting with the table B which is related to A which has the most relationships, and so on until there are no more. This involves a lot of recursion.

The initial result of a crystal formation is sometimes pretty good, and then there are last-microsecond patchups that can enhance it. Look for pairs of tables that are not near each other and see whether, if they were exchanged with each other or if one of them was moved close to the other, the total bend count for the table would go down. This usually helps just a little bit and adds more than a little bit to the computer time, so it can only go on for a limited number of looks.


So much for the esthetics that are specifically for ER diagrams. I suppose I should add that there have to be further esthetic features that apply to widgets in general. The diagram must be changeable with respect to fonts, header and detail background colours, and line widths.
For example, for this diagram of a limited number of tables in sakila I specified that column names ending with _id should be in bold, the lines should be extra wide, and header background should be yellow.

(One thing users can’t change is the way that an arrow looks, the so-called Crow’s foot. At least one other product offers choices for this.)

I imagine that other high-class GUI clients can do such touchups so this is nothing special, but in a post about esthetics I figure I ought to mention this final step.

But where’s the SQL?

In this case the SQL statements are the easy part, especially if the DBMS supports SQL-standard INFORMATION_SCHEMA access.

All the necessary information about foreign-key relationships can be picked up with
There’s a quirk that is non-standard: your favourite DBMS insists that primary-key constraints must be named PRIMARY, so KEY_COLUMN_USAGE has all we need for deciding which columns are part of the a primary key, as well as which columns are part of a foreign key. And if the user specifies that the drawing should have all columns instead of just primary-key columns,
does the job.

However, I don’t think it’s a good idea for the program to do those SELECTS every time the user asks for a drawing. They’re expensive on the server. (Rule: taking time on a client computer is trivial, taking time on the server might not be.) So this is done as part of an “explorer”, which means the user can slowly slowly pick up all the table and column and relationship information at once, so it’s stored in the local computer’s memory, and subsequent requests to draw don’t affect the server at all unless the user decides to refresh potentially-obsolete information.


The feature I’ve just described comes with ocelotgui 1.9 which is available for download on github.

Foreign Key Displays

For foreign keys, I like to see what references what, It’s easy to find but the challenge is to decide how to display.

The cute

Showing a result with the mysql client:

prompt> call show_fk('t2');
| referencing | constraint_name | referenced |
| t2          | mx              | t3         |
| s1          |                 | s1         |
| s2          |                 | s2         |
|             |                 |            |
| t2          | t2_ibfk_1       | t1         |
| s1          |                 | s1         |
6 rows in set (0.08 sec)

Query OK, 6 rows affected (0.08 sec)

That is, for table T2, I show the table name and foreign-key name and referenced-table name, then the columns of that table in the order they appear in the foreign key, then if there is another foreign key repeat after a blank line.

Here is a long but simple stored procedure that produces such a display.

CREATE PROCEDURE show_fk(ref_table_name VARCHAR(128))
 DECLARE d_table_name VARCHAR(128);
 DECLARE d_column_name VARCHAR(128);
 DECLARE d_ordinal_position INT;
 DECLARE d_constraint_name VARCHAR(128);
 DECLARE d_referenced_table_name VARCHAR(128);
 DECLARE d_referenced_column_name VARCHAR(128);
 SELECT table_name, column_name, ordinal_position,
 referenced_table_name, referenced_column_name
 FROM information_schema.key_column_usage
 WHERE table_name = ref_table_name AND referenced_column_name IS NOT NULL
 ORDER BY constraint_name, ordinal_position;
 CREATE TEMPORARY TABLE fks (referencing VARCHAR(128),
                             constraint_name VARCHAR(128),
                             referenced VARCHAR(128));
 OPEN x;
 WHILE err = 0 DO
   FETCH x INTO d_table_name,d_column_name, d_ordinal_position,
              d_referenced_table_name, d_referenced_column_name;
   IF err = 0 THEN
     IF counter <> 0 AND d_ordinal_position = 1 THEN
       INSERT INTO fks VALUES ('', '', '');
     END IF;
     IF d_ordinal_position = 1 THEN
       INSERT INTO fks VALUES (d_table_name, d_constraint_name, d_referenced_table_name);
     END IF;
     INSERT INTO fks VALUES (d_column_name, '', d_referenced_column_name);
     SET counter = counter + 1;
   END IF;
CALL show_fk('t2');

This is vaguely like an entity-relationship diagram, but with tables rather than pictures.

The flaws are: (1) it needs extra privileges, (2) it mixes different object types. So let’s look at the simpler and more common type of display.

The usual

Everything necessary can come from information_schema.key_column_usage.

For example, there’s a GUI that displays with these columns:

Name | Schema | Table | Column |Referenced Schema | Referenced Table | Referenced Column

That’s easy to reproduce with

SELECT constraint_name AS `Name`,
 table_schema AS `Schema`,
 table_name AS `Table`,
 column_name AS `Column`,
 referenced_table_schema AS `Referenced Schema`,
 referenced_table_name AS `Referenced Table`,
 referenced_column_name AS `Referenced Column`
FROM information_schema.key_column_usage
WHERE referenced_column_name IS NOT NULL
ORDER BY constraint_name, ordinal_position;

Or if that takes too long to type, make a view.

There’s another GUI that displays with these columns:

Key name | Columns | Reference Table | Foreign Columns | On UPDATE | On DELETE

(The “On UPDATE” and “On DELETE” values would have to come from information_schema.referential_constraints.)

The objection that I’d make is that such headers are not standard. So anybody who knows the actual column names has to do a double take, wondering whether the first column is the same as “constraint_name” or something exotic, and so on. Use of multiple different names for the same thing is poetry not programming.

The new

So I think this display, which admittedly makes cosmetic changes (replacing ‘_’ with ‘ ‘ and changing upper case to mixed case and emphasizing one column) is better:

I’ve made it so that can come from user statements or from the explorer. The source code is downloadable now and the released executables will come soon.

Offset and fetch clauses in MariaDB

A new feature in MariaDB 10.6 is “Implement SQL-standard SELECT … OFFSET … FETCH”.

I’ll say what’s not in the manual: mention other vendors, add details, talk style, show how to get the wrong answer, describe WITH TIES, mention alternatives, experiment with locks, and list 11 minor flaws.

Other vendors

Products with support for OFFSET … FETCH:
Oracle, DB2, SQL Server.

Products with support for LIMIT … OFFSET:
Tarantool, MySQL, Vertica, PostgreSQL.

Products with support for neither:
Firebird, ScyllaDB, Actian.

This isn’t a complete list — there are lots of SQL products nowadays — but it does include the “Big Three”. All of them support SELECT … OFFSET … FETCH.

Syntax details

The BNF actually is:

[ OFFSET start { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ]

The “start” and count” must be unsigned integers if they’re literals, but if they’re declared variables then they can be NULL (in which case they’re treated as zeros), or can be negative (in which case they’re treated as -infinity for “start” and +infinity for “count”) (but depending on undefined behaviour isn’t always fun).

The lowest offset is zero which is the first row, and the lowest count is zero, that is, it is possible to select zero rows. If count is omitted, then 1 is assumed.


If you haven’t read the descriptive SQL style guide and clicked like, then you won’t understand this section, skip it.

OFFSET and FETCH are both either “clause starts” or “sub-clause starts” so if the choice is to put separate clauses on separate lines, then they are.

Deciding whether to use ROW rather than ROWS, or FIRST rather than NEXT, (neither of which make a difference for server behaviour), one could base a rule on English grammar:
(1) If there was an OFFSET clause with a non-zero number-of-rows, then NEXT, else FIRST. If number-of-rows is greater than one, or unknown, then ROWS, else ROW.
or one could base a rule on what comes first in the MariaDB manual BNF:
(2) The BNF has {FIRST|NEXT} and {ROW|ROWS} so FIRST and ROW.
or one could base a rule on what the examples in the manual use most often:
(3) FIRST and ROWS
or one could base a rule on whatever is shortest:
(4) NEXT and ROW.

I like (3), for example


but there’s no established rule yet so I’ll vary the format in what follows.

How to get the wrong answer

Try this:

CREATE TABLE td (char_1_column CHAR(1) DEFAULT '1',
                 null_column INT,
                 char_2_column CHAR(2));
CREATE INDEX int_char_1_column_plus_char_2_column
  ON td (char_1_column, char_2_column);
INSERT INTO td (char_2_column)
  VALUES ('a'),('b'),('c'),('d'),('e'),('ab');
SELECT char_2_column, null_column
  FROM td
  WHERE char_1_column > '0'
SELECT char_2_column, null_column
  FROM td WHERE char_1_column > '0'

Result of first SELECT:

Result of second SELECT:

So there’s no (‘d’,NULL) and there’s a duplicate (‘ab’,NULL). Wrong answer.

I don’t think it’s a bug because, in the absence of an ORDER BY statement, there is no guarantee which rows will be skipped in the second SELECT. But luckily it’s due to an exotic situation which is easy to avoid: the combined index on a non-unique column and a multi-character string column. Just don’t use the index, and all is well.


The effects of the new syntax are mostly the same as with LIMIT … OFFSET, but WITH TIES is new.

Suppose there are 6 rows, selected in order (there must be an ORDER BY clause):
Thus WITH TIES doesn’t mean that the two values are technically “equal”, they merely must be peers, so NULLS tie. The interesting thing is that when you ask for “NEXT 1 ROW” you get the next 2 rows. But that only applies to FETCH, it doesn’t apply to OFFSET, therefore this could be a mistake:


With the example 6 rows, the second SELECT will return a row that was already returned for the first SELECT. It would be better to say, instead of “OFFSET 1 ROWS”, “OFFSET (number of rows returned by the previous SELECT) ROWS”.


A standard way to fetch only one row:

  OPEN c;

A non-standard way to fetch only one row: (remember to restore @@sql_select_limit later):

SET sql_select_limit = 1;

A way to offset by 10 and fetch 10 (this assumes that pk has all the integers in this range i.e. it depends on user knowing about contents and indexes, and assumes that user only wants to go forward without skipping)

WHERE pk > 10 AND pk <= 10 + 10;

A way to let the user do the offsets and fetches (this depends on the client being a GUI and the user knowing how to click on a scroll bar)


… although users often only look at the first few pages (think of how often you use Google and only click Next a few dozen times).


Will locking, or SET TRANSACTION ISOLATION LEVEL, guarantee consistency?

For a test, I started two clients. On the first, I said:

  WHILE v <= 10000 DO
    SET v = v + 1;
SET @@autocommit=0;
  WHERE pk > 400

On the second, I said:

SET @@autocommit=0;
DELETE FROM pks WHERE pk = 7227;
DELETE FROM pks WHERE pk = 405;
DELETE FROM pks WHERE pk = 411;

Result: The first DELETE succeeds, the second DELETE hangs until I say ROLLBACK; on the first client. The third DELETE would also have hung if I had said it first.

In other words: the rows between 401 and 410 (the rows skipped due to OFFSET) are locked, and the rows between 411 and 412 (the rows picked up by FETCH) are locked, but the rows after 412 (the rest of the rows matching WHERE) are not locked.

What this means, of course, is that ordinarily you can’t be certain that you’re paging through a consistent set. If this is a concern (it shouldn’t always be), make a copy on either the server or the client.

Minor Flaws

A few incompatibilities with standard SQL or another DBMS or another bit of MariaDB 10.10 syntax.

Table t has 4 rows created thus:

INSERT INTO t VALUES (1,2),(2,1),(3,2),(4,1);



Result: Success, all 4 rows in table t are returned. It appears that the OFFSET clause is ignored.



Result: Error 1691 (HY000) A variable of a non-integer based type in LIMIT clause
Never mind whether var1 should be accepted. The error message is obsolete, I was not using a LIMIT clause.



Result: Success, all rows are returned.
Sometimes non-integer variables are illegal, but here a BIT variable is legal.
MariaDB calls BIT a numeric data type but SELECT * FROM t OFFSET b’1′ ROWS wouldn’t work.


SET sql_mode='oracle';
  var1 NUMBER := 1;
    SELECT * FROM t OFFSET var1 rows;

(Don’t forget to restore sql_mode later.)
Result: Error 1691 (HY000) A variable of a non-integer based type in LIMIT clause
That’s right for standard SQL (“The declared type … shall be an exact numeric with scale 0”) but Oracle says it might be more forgiving (“If rowcount includes a fraction, then the fractional portion is truncated.”)



Result: Success, zero rows are returned.
That’s wrong for standard SQL, which would return “data exception — invalid row count in fetch first clause”.



Result: Success.
But the manual’s BNF is

OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }

which might suggest to newbies that the OFFSET clause is compulsory before FETCH.


SET @x = 1;

Result: Syntax error.
I can find, by saying CREATE TABLE .. AS SELECT @x; and then saying SHOW CREATE TABLE or SELECT … FROM INFORMATION_SCHEMA, that @x is BIGINT. BIGINT declared variables are legal, and I can use variables that start with @ in some other situations. This is how things go when variables have variable types.



Result: Success. I believe this is reasonable, and of course any non-standard syntax can be handled in an implementor-defined way. But it’s not documented on the documentation page for SHOW WARNINGS.



Result: Success.
In standard SQL:2016 ROW and ONLY are reserved words.
MariaDB tends to accept goofiness since it’s not your nanny, but I believe in egregious cases the standard strictness is preferable after reasonable notice.



Result: Error.
Another inconsistency, since LIMIT 1 ROWS EXAMINED 1 is legal.



Result: Success.
This is fine, but once again undocumented.

ocelotgui support

The Ocelot GUI client recognizes MariaDB syntax when connected to a MariaDB server, so if you need features like autocomplete after OFFSET to be correct then download it from github.

Duplicate column names

A former colleague from my MySQL AB days asked me about
SELECT pk, SUM(amount) AS pk FROM t ORDER BY pk;
should the duplicate use of the name PK be illegal?

My reply is: A relational-database expert deplores it in the select list; the SQL standard says it’s conceptually illegal in the order-by clause; MySQL and MariaDB handle it in an odd way; the Ocelot GUI for MySQL and MariaDB doesn’t flag it.

A relational-database expert deplores

CJ.Date, in Date On Databases, Writings 2000-2006, has a multi-page section about “Duplicate Column Names”. I’ll just quote the firmest negatives.

… we can avoid the duplicate names if we want to. The problem is, we don’t have to, and SQL does not have to deal with the possibility that a given table might have column names that aren’t unique. As an aside, I should explain that … we don’t have to use the “AS column name” construct is because it wasn’t part of the original standard (it was introduced with SQL:1992), and compatibility therefore dictates that its use has to be optional. (I should also note in passing that — believe it or not — names introduced by AS don’t have to be unique!

*Every SQL table has a left-to-right column ordering *Every column in every named SQL table (i.e. base table or view) has a user-known name, and that name is unique within the table in question. But neither of these properties holds, in general, for unnamed tables (i.e. intermediate and final result tables). … I’d like to point out that duplicate and missing column names both constitute a fairly egregious violation of The Naming Principle

So nothing’s ambiguous about Mr Date’s evaluation, but he is acknowledging that SQL won’t be changing due to compatibility with old standards, and at least it wouldn’t allow duplicate column names in the select list of a CREATE VIEW statement.

The SQL standard says it might be illegal

The standard says this about Concepts:

This Clause describes concepts that are, for the most part, specified precisely in other parts of ISO/IEC 9075. In any case of discrepancy, the specification in the other part is to be presumed correct.

So we usually expect a specific syntax rule, but if it’s missing, we maybe can look at Concepts.

And this is SQL:2016, Foundation, 4.13 Columns, fields, and attributes, in the Concepts section:

Sometimes the Syntax Rules forbid two elements at different ordinal positions from having equivalent names (particularly when created through an SQL schema statement). When two elements with equivalent names are permitted, then the element name is ambiguous and attempts to reference either element by name will raise a syntax error.

That suggests that if pk appears twice in select list then ORDER BY pk; should fail, but it’s not explicitly stated in the Syntax Rules. This in contrast with GROUP BY, which has a clear Syntax Rule, that is, “Each grouping column reference shall unambiguously reference a column of the table resulting from the from clause.” Perhaps when I was on the ANSI committee I should have asked why Syntax Rule for ORDER BY doesn’t also say “unambiguously”, but it’s too late now.

MySQL and MariaDB handle it in an odd way

I tried variations of the syntax with MySQL 8.0.31 and MariaDB 10.10.1, sql_mode=only_full_group_by is not true, sql_mode=ansi_quotes is true. They both return the same results.

CREATE TABLE t (pk INT, amount INT); INSERT INTO t VALUES (1,2), (2,1),(1,2),(2,1); SELECT pk, (amount) AS pk FROM t ORDER BY pk;
Result: Error 1052 (23000) Column ‘pk’ in order clause is ambiguous
SQLSTATE values starting with 23 are for “integrity constraint violation”, so it would be better to have something starting with 42 (syntax error or access violation), I described good old ’42’ in an earlier blog post. However, the error message is good and consistent with standard SQL.

SELECT pk, SUM(amount) AS pk FROM t ORDER BY pk;
Result: 1,6.
SELECT pk, 5 AS pk, amount FROM t ORDER BY pk,amount DESC;
Result: (1,5,2),(1,5,2),(2,5,1),(2,5,1)
SELECT pk, (SELECT SUM(amount) AS pk FROM t) AS "pk" FROM t ORDER BY pk;
Result: (1,6),(2,6),(1,6),(2,6)
These results are harmless — there’s either an implied grouping or a literal for the main sort, so they’re always the same value, so they can be optimized away. Nevertheless, they’re inconsistent — if “Column ‘pk’ in order clause is ambiguous” is detectable, and it’s a syntax error not a runtime error, it shouldn’t have been ignored.

SELECT pk * - 1 AS pk, pk FROM t ORDER by pk;
Result: (-2,2),(-2,2),(-1,1),(-1,1)
SELECT pk, pk * - 1 AS pk FROM t ORDER by pk;
Result: (2,-2),(2,-2),(1,-1),(1,-1)
These results are odd, they show that we’re ordering by the pk that has an AS clause, regardless of which column is first in the select list.

SELECT SUM(amount) AS pk, pk FROM t GROUP BY pk ORDER BY pk;
Result: (2,2),(4,1)
Also there is a warning: Column ‘pk’ in group statement is ambiguous. But it does succeed, and once again the item with the AS clause is preferred. This syntax is explicitly declared illegal in standard SQL.

SELECT * FROM (SELECT 1 AS a, 1 AS a) AS t1, (SELECT 1 AS a, 1 AS a) AS t2;
Result: Error 1060 (42S21) Duplicate column name ‘a’
This is the right result, and the SQLSTATE is better because it starts with ’42’.
The reason it’s correct is that Trudy Pelzer long ago reported a bug with duplicate names in subqueries, which are illegal due to a different item in the standard, and Oleksandr Byelkin fixed it.

The Ocelot GUI for MySQL and MariaDB doesn’t flag it

An obvious thought is: since ocelotgui recognizes MySQL/MariaDB syntax, and gives hints or autocompletions while the user is entering a statement, should I be adding code to catch this? After all, we stop most illegal syntax before it goes to the server.

But I’m wary. The behaviour that I describe here is undocumented, so it could change without warning, and besides who are we to reject what the server would accept? It won’t be at the top of the “To Do” pile.

Speaking of “To Do” piles, ocelotgui 1.8 has now been released, and is downloadable from github.

Data type conversion and SQLSTATE

A while ago we got a question about data type conversion and SQLSTATE …

Suppose we have data types T1 and T2 (either built-in or user defined).

  • This pair is not explicitly listed in the Store Assignment rules.
  • There is no a CREATE CAST for this pair defined.

Reading the standard, I have an impression that these statements should fail:

UPDATE t1 SET T1_col=T2_col;
INSERT INTO t1 (T1_col) SELECT T2_col FROM t1;
SET T1_var=T2_var; — inside an SP

But I could not find which exactly error should be raised:

  • what should be SQLSTATE
  • what should be error message text

A related question:
Is it possible to do CREATE CAST for a pair of two built-in data types?

The answer was …

I will be quoting 9075-1:2011 Framework and 9075-2:2016 Foundation.

Framework says:
“In the Syntax Rules, the term shall defines conditions that are required to be true of syntactically conforming SQL language. When such conditions depend on the contents of one or more schemas, they are required to be true just before the actions specified by the General Rules are performed. The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent. If any condition required by Syntax Rules is not satisfied when the evaluation of Access or General Rules is attempted and the implementation is neither processing non-conforming SQL language nor processing conforming SQL language in a non-conforming manner, then an exception condition is raised: syntax error or access rule violation.”
… Therefore: if you see the word “shall” in a syntax rule, and the condition is not true, it is a syntax error or access violation.

Foundation says: re set clause list:
“Otherwise, the Syntax Rules of Subclause 9.2, “Store assignment”, are applied with the column of T identified by the <object column> as TARGET and the <update source> of the <set clause> as VALUE.”

Foundation says: re store assignment: syntax rules: where SD = source data type and TD = target data type:
“If TD is character string, binary string, numeric, boolean, datetime, interval, or a user-defined type, then either SD shall be assignable to TD or there shall exist an appropriate user-defined cast function UDCF from SD to TD.”
… Therefore: since the word “shall” has been used and this is a syntax rule: if the condition is not true, it is a syntax error or access violation.

Foundation says: re definition of assignable:
“A binary string is assignable only to sites of binary string type.”
“A number is assignable only to sites of numeric type.”
… And so on. That is, for each data type, there’s a note about what assignable means for that type.
Now take this as an example:
… SET binary_string_target = numeric_source …
According to “A binary string is assignable only to sites of binary string type”, this SD is not assignable to this TD.
Therefore, unless there is an appropriate user-defined cast function, the partial condition “SD shall be assignable to TD” has not been met.

For a moment ignore the other partial condition, “there shall exist an appropriate user-defined cast function”. In that case, now we know that the rule for store assignment, which is required by the rule for set clause list, has not been met. And that rule is a syntax rule that includes the word “shall”.
So it is a syntax error or access rule violation.
Table 38 “SQLSTATE class and subclass codes” says that the class for syntax error or access rule violation is 42.
Thus it’s clear — at last, eh? — that the SQLSTATE class is 42. What about the subclass?
Foundation says:
“NOTE 767 — One consequence of this is that an SQL-implementation may, but is not required by ISO/IEC 9075 to, provide subclasses for exception condition syntax error or access rule violation that distinguish between the syntax error and access rule violation cases.”
In other words, you can say SQLSTATE is 42000 but you can say other things too.
Here I recommend that you look at what DB2 for z/OS has.

“42806 A value cannot be assigned to a variable, because the data types are not compatible.” or
“42821 A data type for an assignment to a column or variable is not compatible with the data type.”

Conclusion: the SQLSTATE can be 42xxx, e.g. 42821, syntax error or access violation.

But there are two loopholes! Long ago, when my job was to say how MySQL was compatible with standard SQL, I used them both.

Loophole #1: See above: “The treatment of language that does not conform to the SQL Formats and Syntax Rules is implementation-dependent.”
Well, whenever MySQL/MariaDB has a type that is not mentioned in the official rules, such as TEXT rather than CHARACTER VARYING, that means you can declare there is no error.

Loophole #2: See above: “or there shall exist an appropriate user-defined cast function”.
Well, there is a cast function whenever MySQL/MariaDB says there is. That is, if Monty declares that you can assign a number to a varchar, while smiling, I would explain: the meaning of the word “user” is implementation-dependent. In this case, the user who created the function is Monty. That means you can declare there is no error.

But, if you declare that you are using the loopholes, then CREATE CAST will be illegal because Foundation requires: “There shall be no user-defined cast for SDT and TDT.” and, alas, where user = Monty, there is a user-defined cast.

Thanks to Alexander Barkov for the question.

ocelotgui explorer

On an unrelated note, the Ocelot Graphical User Interface has an explorer.

Sometimes called a navigator, and vaguely like what people are used to with Windows Explorer, this is a widget that appears (usually on the left but detachable) showing objects in all schemas that the user can access. If the user right-clicks on an object, an in-context menu appears with SQL statements or client actions that can be performed by clicking. Customizable.

Currently it’s only in source code which you can download from github, as always.

It and its documentation will be in the next release and/or in the next post of this blog.

ocelotgui version 1.7

Version 1.7 of ocelotgui has these changes.

Bug fix: due to a recent MariaDB change, our stored-procedure debugger could not be installed when connecting to a MariaDB server (MySQL debugging was not affected). Now all is well again.

Here, for example, is a screenshot of ocelotgui after connecting to MariaDB 10.8, saying $debug P; and then Alt+5 (“Step”) 4 times, then clicking menu item “Refresh user variables” to see the previous and current values of @a:


Incidentally I’ve just noticed some old comments about ocelotgui on with one user saying “… helped me a lot!” and another saying “I connected, debugged their tutorial, and am amazed. it’s a GREAT step debugger with variable data view on demand. lots to learn, but the #1 choice so far for free.” Don’t get the wrong idea though — debugging is a unique feature but not the major one.

New platform: Actually ocelotgui runs on any Linux distro (and Windows of course); however, the only specific instructions for creating packages were ones for .deb (Debian-based) and .rpm (Red-Hat-based). Now there is a script file for creating packages on Arch-Linux-based distros too, for example Manjaro or EndeavourOS. It’s PKGBUILD.

Enhancement: Detaching wasn’t quite as spiffy as it should be because our size calculations were slightly wrong. Here’s a screenshot showing the same widgets as before, after detaching them all and doing some manual fiddling:

As usual, downloads and executables are available on our github repository.