There’s new SQL syntax in MariaDB 12. MariaDB’s manual doesn’t document it all, so I will try. First in this series is: the Oracle-style CREATE TRIGGER … event with OR, and the standard-style table information_schema.triggered_update_columns.
event with OR
CREATE TRIGGER trigger_name
BEFORE|AFTER
INSERT|UPDATE [OF column-list]|DELETE
[OR INSERT|UPDATE [OF column-list]|DELETE ...]
ON table_name FOR EACH ROW statement_text;
The OR is new. At the time I’m writing this it’s not yet in the MariaDB manual but the feature request for it is closed, MariaDB 12.0+ supports it.
DB2 and SQL Server and PostgreSQL support something similar, but what I think is important for MariaDB is that Oracle supports it. (The feature request is part of MariaDB’s “Oracle compatibility project”.)
So BEFORE|AFTER INSERT OR UPDATE etc. is a non-standard extension that increases Oracle compatibility although sql_mode=’oracle’ is not required.
Gripe: the feature request has the term “multiple events” but that’s wrong, INSERT OR UPDATE is only one trigger event. I’m calling it “event with OR” but there’s no good standard term.
Examples:
CREATE TRIGGER t BEFORE INSERT OR UPDATE OR DELETE ON t
FOR EACH ROW SET @a = @a + 1;
CREATE TRIGGER t AFTER UPDATE OF s1 OR INSERT ON t
FOR EACH ROW SET @a = @a + 1;
The effect is obvious: if the event happens, the trigger statement should happen.
The advantage is obvious: you don’t need to create nearly-duplicate triggers when the table and the statement are the same and the only difference is the event. If there are many triggers, the maintenance — and the understanding of which ones get activated before other ones — could become confusing.
Aside: Oracle takes this reduce-trigger-numbers idea to extremes by also supporting compound triggers.
event with OR, if inserting|updating|deleting
The trigger’s statement may contain the words INSERTING or UPDATING or DELETING. These words are “conditional predicates”, that is, they appear whenever a true|false decision may appear, such as in IF INSERTING, CASE WHEN UPDATING, WHILE DELETING. A conditional predicate is true only for the relevant part of a trigger event, for example if the trigger event is INSERT OR DELETE and the statement is INSERT then INSERTING is true, DELETING is false, UPDATING is illegal syntax.
Example:
CREATE TRIGGER tm
BEFORE INSERT OR UPDATE OF s1 OR DELETE ON t
FOR EACH ROW
BEGIN
CASE WHEN INSERTING THEN SET @a=0;
ELSE SET @a = 1; END CASE;
IF UPDATING OR INSERTING THEN
SET @a = 2; END IF;
WHILE NOT DELETING AND @a <> 3 DO
SET @a = 3;
END WHILE;
END;
MariaDB does not support the Oracle-style conditional predicate UPDATING(‘column_name’).
MariaDB does not support using a conditional predicate as an ordinary operand. For example, “SET declared_variable_name = INSERTING;” is legal but the result is zero.
Do not assume that UPDATING cannot be legal for an INSERT or DELETE statement, because MariaDB supports INSERT … ON DUPLICATE KEY UPDATE and supports foreign keys with ON DELETE SET NULL.
There is a new error ER_INCOMPATIBLE_EVENT_FLAG which pops up if a conditional predicate doesn’t correspond to anything in the trigger event. For example,
CREATE TRIGGER tx BEFORE UPDATE OR INSERT ON t
FOR EACH ROW
IF DELETING THEN SET @a = 0; END IF;
causes Error 4211 (HY000) Event flag ‘DELETING’ in the condition expression is not compatible with the trigger event type ‘INSERT,UPDATE’.
Gripe: I have no idea why this is called an event flag, or why treat DELETING as an error rather than treat it as false, or why the term isn’t just “trigger event”, or why the SQLSTATE class is HY for something that’s being treated as a syntax error. I don’t see this restriction mentioned in Oracle’s description of conditional predicates.
event with OR, recommendations
“UPDATE OF column1 OR UPDATE OF column2” is unnecessary — “UPDATE OF column1, column2” does the same thing and is compatible with MariaDB 11.
The clause order doesn’t have to be INSERT before UPDATE before DELETE, but there might as well be some convention. Choose whatever the MariaDB manual uses for an illustration when the MariaDB manual describes this feature.
If you care about compatibility with the standard or with MySQL or with earlier versions of MariaDB, instead of caring about compatibility with Oracle, you should continue to use nearly-duplicate triggers.
MariaDB supports duplication such as INSERT OR INSERT, which it will ignore, so it’s your responsibility to avoid such an error.
INSERTING and UPDATING and DELETING are not reserved words. Therefore if the trigger body contains a declared-variable declaration like
DECLARE INSERTING INT;
then later IF INSERTING will be true or false depending on the variable value, INSERTING is not a conditional predicate in this context. So check that you never declared such variables.
After you say CREATE TRIGGER … event with OR, you can see it in information_schema:
SELECT event_object_table, event_manipulation
FROM information_schema.triggers;
The event_manipulation column can now be a comma-separated list such as INSERT,UPDATE,DELETE — not necessarily in the same order that you used in the CREATE TRIGGER statement.
Gripe: Why comma-delimited? The separator in the CREATE TRIGGER statement was “OR” not “,” and I think this isn’t Oracle-like, the examples I’ve seen for
SELECT triggering_event FROM all_triggers
look like ‘INSERT OR UPDATE’ not ‘INSERT,UPDATE’. e.g. the “Sample result” on this Dataedo page and the “Creating a sample trigger:” results on this stackoverflow page.
Anyway, whether or not you like it, you must change any of your existing SQL code that has anything like
WHERE event_manipulation = 'INSERT'
to
WHERE event_manipulation LIKE '%INSERT%'
This is good enough because even if the event was “UPDATE OF inserted_column” that won’t cause a false positive, for a reason that I’ll explain in the next section.
information_schema.triggered_update_columns
Unlike event with OR, this new feature is documented and standard. The information that’s missing in information_schema.triggers.event_manipulation is: what column? Even if you say UPDATE OF, the column name won’t be there because the standard says the only possible values are ‘INSERT’ | ‘UPDATE’ | ‘DELETE’. Luckily you don’t often need to know, but it could happen.
You can get something by looking at the SHOW of each trigger. For example (here I use an ocelotgui feature to make it look simple) (but it’s not simple):
SELECT `SQL Original Statement`
FROM (SHOW CREATE TRIGGER t) AS shower
WHERE `SQL Original Statement`
regexp '.*\\s*before|after\\s.*update.*of.\\ss2\\s|,*\\son\\.*'
= 1;
although this fails if there are newlines or the important words are inside comments or strings. Repeat for every trigger.
So, enter triggered_update_columns. The column names are listed in the MariaDB 12.2 manual.
Gripe: Why triggered? The update is “triggering” not “triggered”. I claimed this name is silly until I Peter Gulutzan the standards expert got a polite reminder from Sergei Golubchik the MariaDB expert that that’s the name in the standard. Oops.
So now, if I want to see what columns are in CREATE TRIGGER UPDATE OF clauses, I SELECT event_object_column FROM information_schema.triggered_update_columns.
Big improvement.
The standard says the criterion for triggered_update_columns is: “Identify the columns in this catalog that are identified by the explicit UPDATE trigger event columns of a trigger defined in this catalog that are accessible to a given user or role.” A column is explicit if it’s mentioned in the trigger event. If the trigger was made with
CREATE TRIGGER ... BEFORE|AFTER UPDATE OF s1 ON t ...
then s1 is explicit, so it gets a row in triggered_update_columns. If the trigger was made with
CREATE TRIGGER ... BEFORE|AFTER UPDATE ON t ...
then all columns in t are implicit, so do not get rows in triggered_update_columns although of course any changes to them cause trigger activation.
information_schema.triggered_update_columns example
You can look for either explicit or implicit columns with left joins.
Suppose I want to see all the triggers on table t1 column s2. To make it simple I assume there is only one database. This will do it:
SELECT a.trigger_name,
a.event_manipulation,
a.event_object_table,
b.event_object_table,
b.event_object_column
FROM information_schema.triggers a
LEFT JOIN information_schema.triggered_update_columns b
ON a.trigger_name = b.trigger_name
WHERE a.event_manipulation LIKE '%UPDATE%'
AND a.event_object_table = 't1'
AND b.event_object_column = 's2'
OR b.event_object_column IS NULL;
Suppose I get two rows:
The first row is a match because there’s a TRIGGERED_UPDATE_COLUMNS row with ‘s2’ (obviously I must have earlier said CREATE TRIGGER t1u BEFORE|AFTER UPDATE OF s2 ON t1 …). The second row is a match because there’s no TRIGGERED_UPDATE_COLUMNS row so I see NULL (obviously I must have earlier said CREATE TRIGGER t1t BEFORE|AFTER INSERT OR UPDATE ON t1 …) — thus I know that all columns are affected.
ocelotgui 2.6
Thinking about incompatibility with MariaDB 11 made me think about this: That is, while you’re typing, the GUI shows the possible choices for the next word as a pulldown menu. I suppose every GUI has that. But I don’t think every GUI has a tooltip saying what MySQL or MariaDB version first supported that word in that place. This is a tentative feature, it’s in the source code but not in the executables of the recently-released ocelotgui 2.6. The same is true for other features described above. As always download of the source code or the release is possible from github.
To comply with GPL requirements, MySQL and MariaDB include a COPYING or LICENSE file, which has
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA.
and maybe something’s wrong with that.
To comply with RPM Package Manager requirements, distros such as Red Hat or Fedora suggest a tool rpmlint. And rpmlint reads that file and spits out:
E: incorrect-fsf-address
where “fsf” means Free Software foundation, and “E” means Error.
First noticeable thing: the zip code 02110-1335, which appears in MariaDB COPYING, is odd. Google’s AI tells me “The zip code for 51 Franklin Street, Fifth Floor, is 02110-1301” not 02110-1335. MySQL and some other licences have 02110-1301, and the confusion goes back years. But changing the zip code doesn’t make the error message go away.
Second noticeable thing: the FSF moved out of the Franklin Street address in August 2024, says Wikipedia. They’re now a USPS postbox at
You should have received a copy of the GNU General Public License along
with this program; if not, see <https://www.gnu.org/licenses/>.
This explains why rpmlint didn’t complain before but will with the latest RPM-based distros. That’s why I ran into the error myself — I was preparing an update of ocelotgui and upgraded to Fedora 42 before I did so.
All right, so which of those does rpmlint think is correct? It doesn’t say, but it’s open source, so I looked at the source code, which is at https://github.com/rpm-software-management/rpmlint. Here are the important lines, in ./rpmlint/checks/FilesCheck.py (I’ve added some line feeds and removed some spaces):
...
fsf_wrong_address_regex =
re.compile
(br'(675\s+Mass\s+Ave|59\s+Temple\s+Place|02139|51\s+Franklin\s+St)',
re.IGNORECASE)
...
if fsf_license_regex.search(self._file_chunk) and \
fsf_wrong_address_regex.search(self._file_chunk):
self.output.add_info('E', pkg, 'incorrect-fsf-address', fname)
...
Oh, so if I say Mars or Shangri-La, okay, as long as it’s not an address that used to be valid, in any of my package’s text files.
I’ve written an open-source Linux program, pgoptionfiles.c, that tells what a MySQL or MariaDB Connector C library will choose for default option files. In my last post I’d mentioned how choices can differ and can be hard to know. I said ptrace() was a possible help, and pgoptionfiles proves that ptrace() works.
First it forks so there are two processes, a tracee and a tracer. The tracee process calls dlopen() to load a Connector C library, then calls mysql_options(…MYSQL_READ_DEFAULT_GROUP…) to request reading of all default option files, then calls mysql_real_connect() with a fake target so that the connector will actually start. Meanwhile the tracer process is monitoring the tracee’s file-related calls such as access() or fopen() — and forcing them to return errors. At the end the tracer displays the connector version and the file names.
Here are some results:
/* Ubuntu 20 + , MariaDB connector */
(pgoptionfiles)(Connector C version 3.4.3)
/etc/my.cnf
/etc/mysql/my.cnf
/home/pgulutzan/.my.cnf
/* Fedora 42 + MySQL connector */
(pgoptionfiles)(Connector C version 8.0.42)
/etc/my.cnf
/etc/mysql/my.cnf
/home/pgulutzan/.my.cnf
/home/pgulutzan/.mylogin.cnf
/* Ubuntu 20 + MySQL connector */
(pgoptionfiles)(Connector C version 8.3.0)
/etc/my.cnf
/etc/mysql/my.cnf
/usr/local/mysql/etc/my.cnf
/home/pgulutzan/.my.cnf
/home/pgulutzan/.mylogin.cnf
For details — and some warnings about how it can fail — read the many comments in the source code. Clone it from https://github.com/pgulutzan/pgoptionfiles. To run it, you need gcc plus an acceptance of GPL.
Some Connector/C options can have better explanations than what’s in the MySQL or MariaDB manuals. Here they are.
Files
Putting together the MySQL and MariaDB Connector/C manuals’ words, Connector/C reads these files in this order:
/etc/my.cnf MYSQL always. MariaDB only if DEFAULT_SYSCONFDIR not defined
/etc/mysql/my.cnf MySQL always. MariaDB only if DEFAULT_SYSCONFDIR not defined
SYSCONFDIR/my.cnf MYSQL if SYSCONFDIR defined. + MariaDB if DEFAULT_SYSCONFDIR defined
$MYSQL_HOME/my.cnf MySQL always. MariaDB always.
defaults-extra-file MySQL always. MariaDB never (but they say maybe soon)
~/.my.cnf MySQL always. MariaDB always
~/.mylogin.cnf MySQL always. MariaDB never
What they don’t say is …
If $MYSQL_HOME is null, then they won’t look in $MYSQL_HOME/my.cnf.
If $MYSQL_HOME is non-null but also $MARIADB_HOME is non-null, then MariaDB will look only in $MARIADB_HOME/my.cnf.
What really happens with environment variables is in this snippet from MariaDB’s source file ma_default.c:
/* CONC-537: Read configuration files from MYSQL_HOME directory only if
MARIADB_HOME was not set */
if (!(env= getenv("MARIADB_HOME")))
env= getenv("MYSQL_HOME");
if (env && add_cfg_dir(configuration_dirs, env))
goto error;
Similarly, you can cause a working mysql client to fail thus:
#undefine the $HOME environment variable
unset HOME
#look for ~/.my.cnf -- this will work
ls ~/.my.cnf
#connect -- this will fail iff ~/.my.cnf has special options
mysql
In other words, what decides matters may be ~/.my.cnf, but may be $HOME.
And they don’t really tell you how to know if SYSCONFDIR is defined. Sure, it’s “usually” e.g. /etc if the connector was built on Red Hat, and you can see what their own statically-linked programs do by saying –help or –print-defaults, but those aren’t guarantees that the library you’re loading was built with the same cmake.
And beware, my.cnf might be symlinked. For example, I see this on Ubuntu 20:
$ ls -l /etc/mysql/my.cnf
lrwxrwxrwx 1 root root 11 May 26 2024 /etc/mysql/my.cnf -> mariadb.cnf
So, rather than believe the manuals, believe the strace utility. For example, I see this when I run a program optiontest that dynamically loads a connector and connects:
The MySQL manual will tell you “When you use a startup option to set a variable that takes a numeric value, the value can be given with a suffix of K, M, G, T, P, or E (either uppercase or lowercase) to indicate a multiplier of 1024, 10242, 10243, 10244, 10245, or 10246; that is, units of kilobytes, megabytes, gigabytes, terabytes, petabytes, or ettabytes, respectively.” (They mean “exabytes”.)
That doesn’t apply to options in an options file. If I say option=1K, the connector sees 1 and stops evaluating when it encounters a non-numeric.
The MariaDB manual even contains an example that contains “–max_allowed_packet=1GB”, though I can’t see where “GB” is documented.
And a connector even accepts this in a my.cnf file:
max-allowed-packet = "+1111111111"
It will change the hyphens to underscores and strip the quotes (single or double quotes both work) and ignore the sign.
This is all okay, therefore it is bad. Don’t be lax re syntax of a max.
Incompatible ABI
In MySQL 5.7 and MariaDB current, the value of option MYSQL_REPORT_DATA_TRUNCATION is 19. In MySQL current, it is 14. This happens because MySQL skipped some values when upgrading to version 8.0, which you can see by comparing mysql.h before-and-after. MariaDB did not make this change. Here’s part of MySQL 8.3 mysql.h on the left, part of MariaDB 12.0 mysql.h on the right:
The skipped values were MYSQL_OPT_USE_REMOTE_CONNECTION, MYSQL_OPT_USE_EMBEDDED_CONNECTION, MYSQL_OPT_GUESS_CONNECTION, MYSQL_SET_CLIENT_IP, MYSQL_SECURE_AUTH, and later MYSQL_OPT_SSL_VERIFY_SERVER_CERT. In “Changes in MySQL 8.0” the wording is “not binary compatible”. MariaDB deserves thanks for doing nothing.
This means life would be easier if mysql.h and Connector/C and the server all had the same version and vendor. Possibly somebody can do some subtracting of the skipped values either within the connector or before calling it, thus: call mysql_get_client_info() to determine whether it returns a MySQL version such as “9.x” or a MariaDB-Connector-C version such as “3.x” (unfortunately it won’t say “MariaDB” but you can tell from the number) (mysql_get_client_info is legal immediately after mysql_init so you don’t need to set options or connect); check #ifdef MYSQL_REPORT_DATA_TRUNCATION and if it’s 14 to determine whether mysql.h came from MySQL or MariaDB; if mysql.h and the connector are from different vendors then call mysql_options(…, option_value+-offset, …) instead of mysql_options(…, option_value, …).
However, a fewer-surprises solution is to get the connector to process the option files directly, so you don’t have to call mysql_options() for all possible numeric values. You can persuade the connector to do that with MYSQL_READ_DEFAULT_FILE (option value 4 in both connectors) or MYSQL_READ_DEFAULT_GROUP (option value 5 in both connectors).
MYSQL_READ_DEFAULT_FILE
The MySQL manual says this is a directive to “Read options from the named option file instead of from my.cnf.” (They mean “instead of any other file”.)
This would be useful, except that it’s exclusive — if you call it twice then the effects of one of the calls are erased.
With MariaDB only, if you pass mysql_options(…MYSQL_READ_DEFAULT_FILE, “”) then the connector reads all option files as if blanks are wildcards. This is not useful, though, because you can get the same effect with either MySQL or MariaDB by passing MYSQL_READ_DEFAULT_GROUP instead.
MYSQL_READ_DEFAULT_GROUP
This too is exclusive but at least you can hope that the connector reads all the default files.
And it’s not exclusive about groups, that is, if you ask for group “x” you get not only group “x” but also the default groups. (Alas, this may not be true for included files.) Or you can ask for group “” and get nothing but the default groups.
MariaDB handles the default groups [client] [client-server] [client-mariadb]. MySQL handles the default group [client] — [mysql] is supposed to be for the mysql client only. It follows then that, if you want your .cnf to work with either MySQL or MariaDB, you have to say
[client]
#options that apply for MySQL
[client-mariadb]
#options that apply for MariaDB
… a MySQL connector will ignore [client-mariadb]. A MariaDB connector will not ignore [client] but will override [client] options with the [client-mariadb] options because they come later.
Since the effects of one of the calls are erased if you call with MYSQL_READ_DEFAULT_GROUP twice, you’re limited to a maximum of one non-default group. Unless you can add one more with –defaults-group-suffix. This is something that MySQL has now and MariaDB might have soon.
Combining file options with command-line options
Suppose you want to let MYSQL_READ_DEFAULT_GROUP handle what’s in the option files, but you also want to process a command line that says –connect_timeout=100.
Seems easy enough, eh? You determine that “connect_timeout” is associated with MYSQL_OPT_CONNECT_TIMEOUT, you put the value after the “=” in an unsigned int, and call:
int mysql_option_result;
mysql_option_result= mysql_options(mysql, MYSQL_READ_DEFAULT_GROUP, "client");
printf("d after MYSQL_READ_DEFAULT_GROUP %d\n", mysql_option_result);
mysql_option_result= mysql_options(mysql, MYSQL_OPT_CONNECT_TIMEOUT, &connect_timeout);
printf("**** d2 after mysql_options MYSQL_OPT_CONNECT_TIMEOUT %d\n", mysql_option_result);
(The option numbers in these two cases are the same for all connectors.)
You see mysql_option_result is 0 both times, you check later with mysql_get_option() and find that the value is 100, and all seems well. But it’s not. Because: the order of option execution is not necessarily the order that you call mysql_options(). In my tests, if a my.cnf file just happens to contain “connect_timeout=200”, that overrides. Even if that’s okay (to me it doesn’t look okay), it’s still undocumented behaviour.
(Aside: if the only command-line options are host|user|password|database|port|socket, these control mysql_real_connect() arguments so you don’t need to worry about an inability to use mysql_options() for them.)
A decent solution is: explain that command-line options might not take effect, and print warnings if mysql_get_option(…MYSQL_OPT_CONNECT_TIMEOUT… ) result is not 100. And yet that’s not a great solution — people might expect that command-line options are trumps. Any way to force that?
Well, I can think of two ways:
1. Open ~/.my.cnf for appending, strip the –s from the arguments and put them in, call mysql_options with MYSQL_READ_DEFAULT_FILE, call mysql_real_connect, then remove what you appended.
2. Create a temporary file, make the first line something like this (replacing “$HOME” with your full path) !include $HOME/.my.cnf, then strip the –s from the arguments and put them in, call mysql_options with MYSQL_READ_DEFAULT_FILE, call mysql_real_connect, then remove what you appended.
But: you might not have write privileges (even if you do, there might be a sticky bit as if you’re in a sandbox).Or: you might conflict with some other program that wants the same files Or: you’d be missing the rest of the default option files besides ~/.my.cnf.
MySQL’s ability to handle ~/.mylogin.cnf is an advantage over MariaDB. But read my earlier post and you’ll see that I supply a decrypter as part of the ocelotgui package.
MYSQL_OPT_INIT_COMMAND
Assuming you have a client that works with default option files for either MariaDB or MySQL, you can run a quick test.
Start the client. Make this simple sample table:
CREATE DATABASE w;
CREATE TABLE w.t (etc INT, home INT);
INSERT INTO w.t VALUES (0, 0);
Exit.
Add these two lines at the end of ~/.my.cnf:
init_command = "UPDATE w.t SET etc = etc + 1;"
init_command = "UPDATE w.t SET home = home + 1;"
Start the client again and say
SELECT * FROM w.t;
The result will look like this:
Now realize: this result proves that both UPDATE statements happened. The init_command option is an undocumented example of what one could call “multi-element” options. An ordinary option overwrites (cancels) any earlier occurrence. A multi-element option, if you say it twice, there are two things to execute (and the order of execution isn’t guaranteed).
This could be unpleasant if the first occurrence is hidden in /etc/my.cnf and you only see the second occurrence in ~/.my.cnf and think it’s all there is. Add that to the fact that the updates might not appear in the client’s log.
So if your client supports anything like “SOURCE file-with-statements”, and the statements are for the initial connect (not auto-reconnect), it’s probably better to use SOURCE instead of init_command.
The MariaDB connector will translate mysql_options4() to mysql_optionsv().
There are other programs that will add “program-name” here, so you might as well too. It will appear in a performance_schema table, so you might as well have performance_schema on. It’s a way to put a comment in a my.cnf file that will be visible with SELECT.
Like init_command, MYSQL_OPT_CONNECT_ATTR_ADD is a multi-element option. So to avoid duplication you might want to say MYSQL_OPT_CONNECT_ATTR_DELETE first.
mysql_get_option
MariaDB is recommending mysql_get_optionv() but mysql_get_option() still works.
Or rather, it works after mysql_real_connect(). It’s useless to call mysql_get_option() until you’re connected, and it’s undefined-results-time if the option wasn’t set earlier.
I thought that the manuals’ examples for strings looked odd, so I supply this example instead (notice the &):
const char *set_charset_name= NULL;
int option_result= mysql_get_option(mysql, MYSQL_SET_CHARSET_NAME, &set_charset_name);
if (set_charset_name == NULL) printf("(null)\n");
else printf("**** %d, MYSQL_SET_CHARSET_NAME=%s\n", option_result, set_charset_name);
Some mysql_get_option() variations might fail even though the option is set correctly and the call is in the right place. This might tempt people to look at session variables instead, but they’re tricky too. For example, mysql_get_option(…MYSQL_OPT_MAX_ALLOWED_PACKET…) does not return the same thing as SELECT @@session.max_allowed_packet.
socket
Suppose, given some of what I’ve described, you decide it’s better to use MariaDB Connector/C when connecting to a MariaDB server, and use MySQL Connector/C when connecting to a MySQL server. So how do you know what you’re connecting to, before you connect?
Well, on my local network I have instances of both servers. For one of them I can use telnet localhost port-3309
~$ telnet 192.168.1.68 3309
Trying 192.168.1.68...
Connected to 192.168.1.68.
Escape character is '^]'.
I
9.3.0$xPUPGNw
�99PRRRp!caching_sha2_password2#08S01Got timeout reading communication packetsConnection closed by foreign host.
For the other I can use telnet localhost port-3310
$ telnet 192.168.1.68 3310
Trying 192.168.1.68...
Connected to 192.168.1.68.
Escape character is '^]'.
R
12.0.1-MariaDB&J%xPd*@�=Dr5Ng?ExQh<.mysql_native_passwordConnection closed by foreign host.
The important value on port-3309 is “9.3.0” so I know it’s MySQL 9.3.0. The important value on port-3310 is “12.0.1-MariaDB” so I know it’s MariaDB 12.0.1.
I can read these things within a program via the socket() utility, and discard. Then with pgfindlib I can pick the connector on my machine that’s best for that port.
But I don’t.
For now, I find it simpler to just suggest in the manual that one of the options on the the command line or in an option file should say what the expected server is.
documentation bugs
In June 2025 I submitted a bug report with 15 complaints about MariaDB documentation. I don’t mean to suggest that the MySQL documentation is better.
ocelotgui
If you’re wondering “Will Peter Gulutzan follow his own advice?” — oh yes, I intend to implement the suggestions I’ve made here, in ocelotgui. I’m just going to wait a week or two in case somebody finds a flaw. Meantime, ocelotgui version 2.5 is not doing a wonderful job of connecting, which I acknowledge. As always, source and executables are downloadable from github.
Errors?
I might be mistaking or omitting things, so read this blog post again after a while, in case I’ve had to make corrections. If you find a particular mistake or omission, comment on this post or send email, my name is pgulutzan and my domain is ocelot.ca.
Update: Comments are now closed. As a solution for one of the problems described in this post, I wrote a program to list MySQL or MariaDB default option files. See the next post in this series.
The MySQL library is libmysqlclient.so, the MariaDB library is libmariadbclient.so or libmariadb.so. I’ll list some quirks that I think are relatively unknown but good to know. I’ll end with a pointer to a function that’s good to have.
mysql and mariadb clients don’t themselves use .so libraries
To illustrate, here’s the file mysql.dir/link.txt that I got when building MySQL’s source with -DWITHOUT_SERVER=ON.
As you can see if you squint, it’s bringing in libmysqlclient.a, the static library.
This affects not only mysql but other executables that MySQL provides. And effects are similar with MariaDB’s source.
MySQL manual says LD_RUN_PATH decides the client .so library
Specifically the Environment Variables” section says in a chart beside LD_RUN_PATH: “Used to specify the location of libmysqlclient.so.” and the “Problems Using the Perl DBI/DBD Interface” section says “Add the path name of the directory where libmysqlclient.so is located to the LD_RUN_PATH environment variable. Some systems use LD_LIBRARY_PATH instead.”
These are not falsehoods but I think they could mislead Linux users.
First, as I mentioned earlier, it doesn’t necessarily apply for what they supply.
Second, as I’ll mention later, on “some systems” — e.g. Linux, eh? — LD_RUN_PATH only has an effect at build time (as when you run gcc) and Linux package developers deprecate it.
Third, as I’ll also mention later, there are several other factors that the dynamic loader will examine though the MySQL manual doesn’t mention them all.
As a client developer I have to pay attention to this advice and look at LD_RUN_PATH when calling dlopen, but it’s not my favourite advice.
MariaDB library isn’t always the same
I said at the start that “the MariaDB library is libmariadbclient.so or libmariadb.so” but it’s not so simple.
There’s a chance that symlinks will exist between libmysqlclient.so and a MariaDB .so.
There’s also a chance that both libmariadbclient.so and libmariadb.so will be available, and libmariadbclient.so will have a symlink to libmariadb.so which is the real target. However, this is not always the case.
I’ve left out the irrelevant stuff. The point is, both .so names are there.
Illustration #2: This is what the MariaDB Connector/C contains after a recent download:
~/connector-c/usr/local/lib/mariadb$ ls
libmariadb.a libmariadbclient.a libmariadb.so libmariadb.so.3 plugin
I’ve left out the irrelevant stuff. The point is, only one .so name is there. So usually you’ll have no trouble linking to libmariadbclient.so but it’s not guaranteed any more.
The server can display the .so
To see this you’ll have to run the server with performance_schema on, which is the default for MySQL but not for MariaDB, you’ll have to ask for it.
Example pasted from an ocelotgui session:
Notice the name and the version number. But if the library was statically linked, and not an .so, this won’t tell you.
But if the client program chooses a different path, this won’t tell you.
Linux utilities can display the .so and the .so path
As well as the programs that come with MySQL or MariaDB that show .so files, there are programs that come with Linux that show .so files. They work with any MySQL or MariaDB executable, but in this section I’ll show examples with common utilities instead because they’re shorter.
I admit that I’m depending on common Linux and ELF format for the examples, if it had to be general I’d perhaps think that libbfd would help me, but I’ve never seen the need.
FIND! Of course one of the programs is the simple sudo find / -name “libmy*.so*” or sudo find / -name “libmaria*.so*” but it takes too long and it doesn’t show the files in the context of the caller. So the value is small.
… These are some standard system paths that ld the GNU linker will look at on a multiarch machine. But they are not necessarily what the dynamic loader will look at so the value is small.
STRACE! For example, if you have ocelotgui, strace ocelotgui 2>/dev/stdout| grep openat will display
openat(AT_FDCWD, "/usr/mysql/lib/tls/x86_64/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/tls/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/tls/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/tls/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/x86_64/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/mysql/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/tls/x86_64/libdl.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
… And so on, for failed attempts to open .so files. The failures are expected, by the way. Subdirectory tls (thread local storage) tends to be absent.
LD_DEBUG! Now for examples I’ll need any program that happens to use the dynamic loader, /bin/cp will do.
Start with the readelf utility to see what a program’s dynamic loader is. Example:
$readelf -l /bin/cp | grep interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
Okay, in this case the dynamic loader is /lib64/ld-linux-x86-64.so.2. (The same name appears in the earlier ldd example above but I think that’s undependable.) It differs depending on the platform and the program, so always check.
First make sure that all LD_ environment variables are unset. Then: LD_DEBUG=libs /lib64/ld-linux-x86-64.so.2 –inhibit-cache /bin/cp 2>/dev/stdout | grep search You’ll see something like
Vaguely speaking, sudo make install can change /etc/ld.so.conf and ldconfig can take from /etc/ld.so.conf to the cache and ldconfig -p can show what’s in the cache. This is good for administrator-approved libraries, as opposed to ones that have been installed in a non-system local directory.
The dpkg package manager can be on Debian-based distros. On rpm-based distros the equivalent is whatprovides. This is telling you what is supposed to be there, as opposed to what is actually there.
All of the above utilities are fine for their purposes but I’m after something different — a function I can call to find libraries at runtime, loaded or unloaded. I’ll describe the solution at the end of this blog post.
There are ways to force the path
The usual dynamic loader (often on /lib64/ld-linux-x86-64.so.2 or /lib/ld-linux.so.2 but read the previous section to check) is going to search in this order: LD_AUDIT — environment variable LD_PRELOAD — environment variable DT_RPATH — set by -Wl non-default option LD_LIBRARY_PATH — environment variable DT_RUNPATH — set by -Wl default option or LD_RUN_PATH environment variable during build ld.so.cache and defaults.
LD_AUDIT: specialized and difficult. I mention it but don’t expect it.
LD_PRELOAD: I’ve seen this recommended for the MySQL or MariaDB server but for clients it’s probably only a good choice if you need to force a specific .so file, as opposed to a specific path containing .so files. Oracle used to suggest this for their drop-in replacement.
DT_RPATH: set before runtime. With gcc the way to force DT_RPATH is gcc … -Wl,-rpath,,-rpath,<library_path>,-disable-new-dtag The -disable-new-dtag is necessary nowadays because DT_RPATH is set with the old tag. With the default new tag, -Wl,-rpath sets DT_RUNPATH instead. In other words, you’ll probably never see this.
LD_LIBRARY_PATH: or actually “ld_library_paths” plural, because you can put multiple paths delimited by colons or semicolons.
DT_RUNPATH: set before runtime. With gcc the way to force DT_RUNPATH is gcc … -Wl,-rpath,<library_path>,-rpath,<library_path> or LD_RUN_PATH=<library_path> gcc … but setting the LD_RUN_PATH environment variable has no effect at runtime.
ld.so.cache: This is a list of the .so files that the dynamic loader will look for by default. It’s visible with ldconfig -r Typically the list is changed when you sudo make install something, so although ldconfig’s not the top choice it’s the most reliable one. You know that some package installer thought that the .so in this list is an appropriate one.
You’ll notice that DT_RPATH and DT_RUNPATH don’t get in unless they’re specified before runtime. And if you specify them yourself and then try to make a .deb package, lintian will tell you: they’re deprecated. For another package type, SUSE will tell you they’re forbidden. So they don’t matter so much.
Since environment variables do matter so much, it can be a good idea to check whether they’re already set, for example you put something in .bashrc and forgot. The best way to check environment variables is printenv (not echo). To check all the environment variables that start with LD_, say printenv | grep LD_
Since environment variables matter so much, it can be a good idea to set them in a canonical way. For example, to say you want the dynamic loader to look in /tmp/A and /tmp/B, say LD_LIBRARY_PATH=/tmp/A:/tmp/B program-name and to temporarily turn it off, say env -u LD_LIBRARY_PATH program-name This is slightly better than export LD_LIBRARY_PATH=program-name because export has a permanent effect rather than an effect specific for program-name, and because export can only set the environment variable to blank rather than getting rid of it. Another thing is that the delimiter between path names is a colon — yes a semicolon would work here but it wouldn’t work with -Wl,-rpath and you might as well have the same delimiter in all possible lists. I didn’t surround the path names with ”s or “”s in the example, but if there’s a $ then their absence could cause confusion. For example: export LD_LIBRARY_PATH=/$LIB printenv LD_LIBRARY_PATH … You’ll see only the / character, because $ is a signal that an environment variable name follows, unless it’s quoted or escaped.
$LIB and $PLATFORM and $ORIGIN
The dynamic loader will replace occurrences of $LIB or $PLATFORM or $ORIGIN within path names.
$LIB: According to a “Linux manual page” and according to man ld.so on my computer, “$LIB (or equivalently ${LIB}) This expands to lib or lib64 depending on the architecture (e.g., on x86-64, it expands to lib64 and on x86-32, it expands to lib).”
Hah! Just kidding. Pilot Logic has a page indicating that’s true for what they call “Red Hat Base” and “Mandriva Base” but not for “Debian Base” or “Arch Base” or “Gentoo Base”. (It’s actually the distro’s glibc that’s responsible.) And there might be more differences in the future.
On top of that, if you go back and look at the example of LD_DEBUG use, you’ll see a bunch of /tls (thread local storage) subdirectories. Luckily, although $LIB can expand to include them, they won’t exist.
$PLATFORM: expect something like “x86_64” on Debian-like 64-bit platforms. Hah! Just kidding again. Debian’s “ArchitectureSpecificsMemo” indicates other possibilities.
$ORIGIN: This is supposed to be the path of the program that’s running. I’ve seen complaints that it’s actually the path of the library in the program that’s running, but I haven’t understood the difference. So if you say /usr/bin/wombat then the $ORIGIN is /usr/bin. And if you build with gcc … -Wl,-z,origin,-rpath,./lib,-rpath,\$ORIGIN that’s what you’ll see — though notice the escaping of the $ here.
How ocelotgui handles it
There’s no reliable way to decide at install time what the client library will be, because the name varies, the location varies, and the appropriate library might depend on the server that ocelotgui is connecting to.
which is reasonable. However, it’s disabled for the .deb or .rpm packages. By default it looks first for libmysqlclient then libmariadbclient then libmariadb, but that’s easy to change with command-line options or in the .cfg file.
And if there’s no non-default specification then the standard-loader paths matter.
This means runtime flexibility is required, and that means we have to use dlopen(). Therefore ldd ocelotgui will tell you nothing, but in this case it’s because we’re looking everwhere, which is quite different from the mysql/mariadb client case because they’re bringing in a .a library.
Finally, Help|About and Help|libmysqlclient will say more about what’s actually connected or could be.
There has to be a new version that takes into account the points that I’ve described here. It will incorporate the code of a new github project: pgfindlib.
pgfindlib
I’ve created a small new GitHub project, pgfindlib. It has major advantages over all the things I’ve described in this post, if the objective is to find relevant .so files at runtime, without cruft, from within a program.
I’ll just quote the first bit of its README here: …
Version 0.9.7
The pgfindlib function finds dynamic libraries (.so files) in the order the loader would find them.
Copyright (c) 2025 by Peter Gulutzan. All rights reserved.
What pgfindlib is good for
Knowing what .so files the loader would load, but loading them yourself with dlopen(), means you can customize at runtime.
Or, as part of –help you can tell users what the loader picked up, and from where, and what choices it ignored.
An example: Using files supplied with the package:
1,,,002 pgfindlib,001 version 0.9.7,003 https://github.com/pgulutzan/pgfindlib,,
2,,,005 $LIB=lib/x86_64-linux-gnu,006 $PLATFORM=x86_64,007 $ORIGIN=/home/pgulutzan/pgfindlib,,
3,,,012 in source LD_LIBRARY_PATH replaced /$LIB with /lib/x86_64-linux-gnu,,,,
4,/lib/x86_64-linux-gnu/libcurl.so,LD_LIBRARY_PATH,013 symlink,,,,
5,/lib/x86_64-linux-gnu/libcurl.so.4,LD_LIBRARY_PATH,013 symlink,,,,
6,/lib/x86_64-linux-gnu/libcurl.so.4.6.0,LD_LIBRARY_PATH,,,,,
7,/lib/x86_64-linux-gnu/libgcc_s.so.1,LD_LIBRARY_PATH,,,,,
8,/lib/x86_64-linux-gnu/libutil.so,LD_LIBRARY_PATH,013 symlink,,,,
9,/lib/x86_64-linux-gnu/libutil.so.1,LD_LIBRARY_PATH,013 symlink,,,,
10,/tmp/pgfindlib_example/libutil.so,DT_RUNPATH,071 elf read failed,,,,
11,/lib/libgcc_s.so.1,ld.so.cache,,,,,
12,/lib/x86_64-linux-gnu/libcurl.so,ld.so.cache,013 symlink,014 duplicate of 4,,,
13,/lib/x86_64-linux-gnu/libcurl.so.4,ld.so.cache,013 symlink,014 duplicate of 5,,,
14,/lib/x86_64-linux-gnu/libgcc_s.so.1,ld.so.cache,014 duplicate of 7,,,,
15,/lib/x86_64-linux-gnu/libutil.so,ld.so.cache,013 symlink,014 duplicate of 8,,,
16,/lib/x86_64-linux-gnu/libutil.so.1,ld.so.cache,013 symlink,014 duplicate of 9,,,
17,/lib32/libgcc_s.so.1,ld.so.cache,075 elf machine does not match,,,,
18,/lib32/libutil.so.1,ld.so.cache,013 symlink,075 elf machine does not match,,,
19,/lib/libgcc_s.so.1,default_paths,014 duplicate of 11,,,,
20,/usr/lib/libgcc_s.so.1,default_paths,014 duplicate of 11,,,,
rval=0
This means: the loader would look first in /lib/x86_64-linux-gnu because of LD_LIBRARY_PATH. This takes precedence over DT_RUNPATH, which is where the first occurrence of libutil.so appears (this appears because of the -rpath option in the gcc command). Finally there are some .so libraries in ld.so.cache and the system libraries, which is where the loader would go if there was no prior. But some of the lines contain warnings, for example “071 elf read failed” because /tmp/pgfindlib_example/libutil.so is not a loadable file, or for example “075 elf machine does not match” because main is 64-bit and /lib32/libgcc_s.so.1 is 32-bit.
That’s all you need to know in order to decide if you’re interested. If you are, read on, there are many options and a few warnings.
… Yes, the input is [FROM clause] WHERE clause, the output is table.
Now go over to the pgfindlib repository https://github.com/pgulutzan/pgfindlib and clone it. Since you’ve read all this, you’ll want that.
CREATE PROCEDURE p()
BEGIN
DECLARE v CHAR CHARACTER SET utf8 DEFAULT '';
SELECT * FROM information_schema.routines
WHERE routine_name > v;
END;
CALL p();
Result: with current MariaDB 11.4 or MySQL 8.3: a result set. Result: with new or future MariaDB: an error message mentioning collations.
The error appears with MariaDB 11.5.1 and later, except for Debian-based distros such as Ubuntu where it appears with MariaDB 11.5.2 and later. It is not a bug, it is a behaviour change.
I’ll describe the error’s cause, the rare situations where the change breaks existing applications, the ways to check, and a workaround.
Assume routine’s character set is Unicode
Let’s say I installed a DBMS and created a UTF8 database. Then I can look at the database’s defaults with
SELECT * FROM information_schema.schemata;
If the server is MySQL 8.3: The database that I created has default_character_set_name=utf8mb4, default_collation_name = utf8mb4_0900_ai_ci. System databases have default_character_set_name=utf8mb3, default_collation_name = utf8mb3_general_ci.
If the server is MariaDB 11.4: The database that I created has default_character_set_name=utf8mb3, default_collation_name = utf8mb3_general_ci. System databases have default_character_set_name=utf8mb3, default_collation_name = utf8mb3_general_ci.
If the server is MariaDB 11.6: The database that I created has default_character_set_name=utf8mb3, default_collation_name = utf8mb3_uca1400_ai_ci. System databases have default_character_set_name=utf8mb3, default_collation_name = utf8mb3_general_ci.
(A system database is sys, performance_schema, information_schema, or mysql.)
The difference between utf8mb3 and utf8mb4 is usually not important. Maybe bigger is better, but I looked at an admittedly-old stack overflow question which established that the extra characters are mostly Gothic, emoji, new Chinese personal names, and some mathematical symbols. At the moment you can’t create routines which require 4-byte UTF8 anyway which means the effect for system tables is nothing.
The difference between uca_0900 and uca_1400 collations is usually not important. The former refers to Unicode Collation Algorithm Version 9 (2016) the latter refers to Collation Algorithm Version 14 (2021). UCA’s version number is “synchronized with the version of the Unicode Standard which specifies the repertoire covered”, which means the order of new emojis or rarely-used scripts will differ. For example the Unicode-14 release notes mention the bronze-age Cypro-Minoan script, so if you use that, you should care.
However, the difference between latin1 and utf8 is usually so important that — I hope — many people will explicitly declare CHARACTER SET UTF8 or some variant, when creating the database or creating the routine. As the manual says: “If CHARACTER SET and COLLATE are not present, the database character set and collation in effect at routine creation time are used.”
Therefore, “DECLARE v CHAR;” will often mean “DECLARE v CHAR DEFAULT CHARACTER SET utf8;”. I’ll assume that’s the case. That’s what causes the error, since with a latin1 default it won’t happen.
The cause
The error message is due to the enhancement request Change default Unicode collation to uca1400_ai_ci which has explanations from the requester and the MariaDB collation expert about some of the things to watch out for, other than this error message. As well, you can find a condemnation of the non-UCA collations such as utf8mb3_general_ci.
The general collation is not awful, for example it knows that “B with dot above” is equal to B.
However, it sees differences where a Unicode _ai_ci collation does not. For example,
But this makes it dangerous to change a system table. If the server manufacturer simply declared that the default collation of information_schema is henceforth UCA, then suddenly the indexes are no good and primary keys contain duplicates for people who happen to have created routines or tables with similar names.
Everybody learned their lesson about that from a long-ago fiasco. As I commented about it in a heated bug report in 2011: “Why don’t we just change the rules for utf8_general_ci, instead of introducing new collations with new rules? Well, as a matter of fact, that happened for another rule affecting German Sharp S in our version 5.1. The results were catastrophic, because collation affects order of keys in an index, and when index keys are out of the expected order then searches fail (Bug#37046 etc.). The only solution is to rebuild the index and when we have customers with multiple billion rows that’s hugely inconvenient. This change was a stupid error, we have sworn not to repeat it.”
Check routine for upgrade
This isn’t the same as CHECK TABLE … FOR UPGRADE and I don’t think it can be automated, it depends on humans and their particular situations.
Probably most straightforward upgrades will not cause a change of the default so there will be no immediate effect — until you create a routine which has the same expression as what worked before, which won’t work now.
Watch out for comparisons of variables with any columns in system tables.
Watch out for DECLARE HANDLER statements which could hide the error.
Watch out for CREATE TABLE statements which use the variable as a source because you don’t want to change their collation inadvertently.
Watch out for parameters too, as the passer’s character set might differ from the catcher’s.
Workaround: explicit collation
Add a COLLATE clause. This will indicate the approved collation, so now the comparison will not cause an error.
That is, change DECLARE v CHAR; /* assuming UTF8 database or explicit UTF8 at some higher level */ to DECLARE v CHAR CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci; But not change WHERE routine_name > v; to WHERE routine_name > v COLLATE utf8mb3_general_ci;
This is a minimal change. It insists on the general collation, rather than a Unicode collation, so it’s compatible with the comparand and is less likely to cause an error and is more likely to cause an indexed search.
The COLLATE in the DECLARE clause usually will not interfere with later statements that contain COLLATE. For example, here is a procedure where there might seem to be a conflict:
CREATE PROCEDURE ae()
BEGIN
DECLARE v CHAR CHARACTER SET utf8mb3
COLLATE utf8mb3_general_ci;
SET v = 'ä';
SELECT coercibility(v);
SELECT coercibility((SELECT max(routine_name
COLLATE utf8mb3_german2_ci)
FROM information_schema.routines));
SELECT * FROM information_schema.routines
WHERE routine_name COLLATE utf8mb3_german2_ci = v;
END;
CALL ae();
The nifty coercibility function will say that v is 2 (implicit) while the routine_name is 0 (explicit), therefore the WHERE’s COLLATE clause trumps the variable’s COLLATE clause, therefore there is no error, therefore ‘ä’ = ‘ae’ is true, therefore there will be one row in the result set.
The workaround has two flaws.
1. For MariaDB 11.6 there’s a feature request Change default charset from latin1 to utf8mb4 So far I haven’t seen that this has happened. But when it does, CHARACTER SET utf8 COLLATE utf8mb3_general_ci could cause trouble because utf8 won’t mean utf8mb3.
Thus COLLATE utf8mb3_general_ci may not be good for the long term, I actually prefer CAST(operand AS CHAR(n) CHARACTER SET utf8mb4) COLLATE utf8mb4_general_ci as something that can work today and can probably work tomorrow.
2. For MySQL 8.3 (which is obsolete) and a similar procedure that looked at tables, I saw the bizarre error message “Error 1267 (HY000) Illegal mix of collations (utf8mb3_tolower_ci,EXPLICIT) and (utf8mb4_0900_ai_ci,EXPLICIT) for operation ‘='” which must have been a bug since there’s no such thing as utf8mb3_tolower_ci, and it doesn’t happen for MySQL 8.4.
3. It’s extra trouble for something that you don’t need in legacy MariaDB versions. To which you might be saying, “Duh, Peter, just use MariaDB’s Executable Comment Syntax to suppress the unneeded stuff when the server isn’t the latest. Well, yes, but I don’t use non-standard syntax if I don’t have to.
Thanks to Alexander Barkov for a suggestion.
How this affected me
The ocelotgui (GUI for MySQL and MariaDB) contains a debugger for stored procedures and functions. It contains DECLARE … CHAR CHARACTER SET utf8; many times. This behaviour change broke it. This commit on GitHub shows my own workaround for it. It did not go through testing with the 11.6 preview. Yesterday I included it in the latest release.
But should the GUI warn about this when parsing CREATE PROCEDURE statements? If anyone thinks so, they can submit a bug report / feature request.
Read The language rules we know – but don’t know we know which says “Adjectives in English absolutely have to be in this order: opinion-size-age-shape-colour-origin-material-purpose Noun.” Then you’ll know this is correct:
because columns are attributes and so are adjectives.
Now I’ll tell you the other factors that affect whether chicken columns precede egg columns.
Primary key columns first
As the Oracle “Ask Tom” guy said unenthusiastically in 2012: “For example – most people – pretty much all people – would put the primary key FIRST in the table. It is a universally accepted approach – convention.” Sometimes this is part of a “rule of thumb”. Sometimes it’s part of “logical data modeling standards”. Sometimes it’s part of “business rules”.
The United States Postal Service says “Automated mail processing machines read addresses on mailpieces from the bottom up and will first look for a city, state, and ZIP Code.” because those folks know that what matters first is the larger area and what matters last is the street spot.
That’s also what people care about so often they’ll say genus before species, or schema name before object name, or test type before test result.
Specificity is not the same as uniqueness — there might be more addresses “100 Main Street” then there are states in the USA. But it is about moving from the target to the bullseye. Unique-first might be an advantage for CREATE INDEX, but not for CREATE TABLE. And that might mean you might want to say CREATE TABLE t (column1, column2 PRIMARY KEY (column2, column1)) but it does not mean you want to reverse the order of display.
Surname first
Looking at the States stats again, there are about 5 million occurrences of the most common given name (James) and about 2.5 million occurrences of the most common last name (Smith). That’s anecdotal but corresponds to the general belief that last names are uniquer than first names. What matters more is that the name James is less useful for identification, and therefore less important.
Alphabetical
If you’ve ever heard “Choose one from column A and one from Column B” you’ll feel this isn’t quite absurd, and if you’ve got 26 columns named column_a through column_z then you’ll appreciate it if there’s an arbitrary-but-familiar way to glance through them.
And in fact it’s not arbitrary if your column names have prefixes that indicate what group they belong to. Surely user_name and user_address belong together, or if it’s the address suffix that matters more then alphabetical DESC would work though I’ve never seen it proposed.
Important before trivial
There’s advice that columns should appear in order of importance, and if users don’t know what’s important then there are clues.
Is the value usually the same or usually NULL? Then it carries little information.
Is the column name absent from most SELECT statements? Then most people don’t care.
Is it indexed, or unique, or in a foreign key? Then somebody has already decided it matters. Accept that and bump them closer to the start.
What matters for SELECT * is: the columns that you like most should be on the left of the screen (even if you’re Arab) so you don’t need to use horizontal scrolling.
What matters for SELECT column_x is: if the row is so big that it might stretch over two blocks, the DBMS might be happier if the column is in the first block.
The ancient wisdom
There’s a rule that I’ve seen often, with occasional variations: “Primary key columns first. Foreign key columns next. Frequently searched columns next. Frequently updated columns later. Nullable columns last. Least used nullable columns after more frequently used nullable columns. Blobs in own table with few other columns.”
So, for example, if your DBMS happens to be one that doesn’t store NULLs if they’re the last things in the row, then that’s good advice. If on the other hand your DBMS happens to be one that has a preface of bits indicating whether the column is NULL, then it’s pointless. Documentation about this might for one InnoDB case be slightly misleading; however, my point is that most of those items will depend on your DBMS and the clauses that you use for a storage engine.
Nevertheless I’ll endorse the “NULLs last” tip because I’ve seen an old claim that it affects SQL Server too. That is, if it’s been a rule for more than one DBMS, even if it’s pointless, maybe it affects compatibility because others still think it’s real.
Old SQL Server rules, for what they’re worth, also include “fixed fields first” and “short before long” but they’re obsolete. Also if you’ve got InnoDB’s compact format then lengths of variable columns are in the header. So when you see claims that you have to journey through the varchar columns till you reach the char columns, be skeptical.
Autocompletion
Suppose in the past you said
CREATE TABLE n (s1 INT, s2 INT);
and now (after rehash) you start an INSERT statement:
INSERT INTO n (
at this point the ocelotgui autocompleter will present you with a hint and you can save a fraction of a second by selecting the first item so you don’t have to navigate.
Unfortunately you won’t save anything when the statement is SELECT, though, because the SELECT list comes before the FROM clause. That is, the GUI won’t know what table you’re using until you’ve already specified the columns.
Speaking of irritating design, I take this opportunity to mention the syntax:
Notice again how the GUI can’t hint about a name at the appropriate point, because it won’t know the library yet. I class this as a violation by the vendor of the advice “general before specific”.
Compression
If there is a “run” of bytes with the same value, then a “run-length encoding” (RLE) algorithm might save space. Thus if if it is likely that 20 columns will all contain 0, then keeping them together is a good idea.
But it’s only an idea. The practical considerations are that (a) there’s no compression (b) the typical compression is some variant of Lempel-Ziv which wouldn’t require the columns to be exactly contiguous, (c) it’s hard to predict what algorithms and byte placements will actually save the most space without trying them all (d) they’re not all available unless you build from source.
Consistency
Consistency between tables: if table #1 has columns in a particular order, then table #2 should too. Imitate the INFORMATION_SCHEMA tables because they are likely to be static, although the collations of string columns are likely to change. I’d add “imitate examples in the manual”, but only if the examples in the manual are themselves consistent. For example the MySQL manual example
CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),
species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);
has no explicit primary key, but the columns might be in order of importance and the final column is the one most likely to contain NULL. For example the MariaDB manual example
has the PRIMARY KEY clause mixed in the column definition but the UNIQUE clause outside the column definition, and has first name before surname.
Consistency between statements: if WHERE clauses often have WHERE a = 5 AND b = 6, or if ORDER BY clauses often have A, B ASC, or if INSERT statements often have (a, b), then the CREATE TABLE definition can serve as a guide by letting users know that a comes before b by design.
Consistency between definitions: if table #1 has one foreign key referencing table #X and then another foreign key referencing table #Y, and table #2 has one foreign key referencing table #y and then another foreign key referencing table #X, then ask yourself why the order of foreign keys in table #2 is not the same as the order of foreign keys in table #1. Ideally the order will be reflected in entity-relationship diagrams.
ocelotgui changes
The ocelotgui 2.4 release is available for download from github.
The debugger feature won’t work with MariaDB 11.5ff for reasons I’ll explain elsewhere. However, if you download the source and build, it will work. Or wait for ocelotgui 2.5 which (I hope) will be released soon.
Progress Software on 2024-04-19 said more about “considering” an offer for MariaDB plc (the company not the foundation).
They own NuSphere which had a dispute with MySQL AB which was settled in 2002. My happy history as a MySQL employee biases me but I thought that NuSphere was not acting angelically.
I think it won’t happen.
Update: Apparently it didn’t. The 2024-04-26 K1 recommended offer differed.
Here’s a snippet of what I’d like SHOW ENGINE INNODB STATUS to look like:
as opposed to what the server sends:
...
--------
FILE I/O
--------
Pending flushes (fsync): 0
295 OS file reads, 1 OS file writes, 1 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
...
In my last post I described a way to Make SHOW as good as SELECT” which was possible because most result sets from SHOW etc. are at least table-like, but STATUS’s wall of text isn’t table-like. So I forced it into a table with these basic rules:
A line inside ——s, like FILE I/O, is the category of what follows.
Otherwise a line is a row but if it contains commas it is multiple rows.
Numbers can usually be extracted from text as different columns..
After that there’s still a bunch of fiddling, I put the details in source-code comments.
Version 2.3
The new features related to SHOW etc. are now in a released version as well as in source code, downloadable from github.
In the rest of this post I’ll show a complete result from SHOW ENGINE INNODB STATUS; (“before”), and the same data from SHOW ENGINE INNODB STATUS WHERE 1 > 0; after ocelot_statement_syntax_checker has been set to ‘7’ (“after”). (Instead of copying the Grid Widget I copied from the History Widget after setting Max Row Count to 100.)
“before”
| InnoDB | |
=====================================
2024-03-19 12:39:45 0x7f80f01f3700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 43 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 2455 srv_idle
srv_master_thread log flush and writes: 2454
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 754
Purge done for trx's n:o < 0 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION (0x7f8111334680), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7f8111333b80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync): 0
295 OS file reads, 1 OS file writes, 1 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
---
LOG
---
Log sequence number 463193
Log flushed up to 463193
Pages flushed up to 362808
Last checkpoint at 362808
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 167772160
Dictionary memory allocated 853016
Buffer pool size 8064
Free buffers 7647
Database pages 417
Old database pages 0
Modified db pages 164
Percent of dirty pages(LRU & free pages): 2.033
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 273, created 144, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 417, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
state: sleeping
----------------------------
END OF INNODB MONITOR OUTPUT
============================
|
SHOW AUTHORS GROUP BY `Location` INTO OUTFILE 'tmp.txt';
You’re thinking “Hold it, MySQL and MariaDB won’t allow SHOW (and similar statements like ANALYZE or CHECK or CHECKSUM or DESCRIBE or EXPLAIN or HELP) to work with the same clauses as SELECT, or in the same places.” You’re right — but they work anyway. “Eppur si muove”, as Galileo maybe didn’t say.
I’ll explain that the Ocelot GUI client transforms the queries so that this is transparent, that is, the user types such things where SELECTs would work, and gets result sets the same way that SELECT would do them.
Flaws and workarounds
I’ll call these statements “semiselects” because they do what a SELECT does — they produce result sets — but they can’t be used where SELECT can be used — no subqueries, no GROUP BY or ORDER BY or INTO clauses, no way to way to choose particular columns and use them in expressions.
There are three workarounds …
You can select from a system table, such as sys or information_schema or performance_schema if available and if you have the privileges and if their information corresponds to what the semiselect produces.
For the semiselects that allow WHERE clauses, you can use the bizarre “:=” assignment operator, such as
SHOW COLUMNS IN table_name WHERE (@field:=`Field`) > '';
and now @field will have one of the field values.
You can get the result set into a log file or copy-paste it, then write or acquire a program that parses, for example by extracting what’s between |s in a typical ASCII-decorated display.
Those three workarounds can be good solutions, I’m not going to quibble about their merits. I’m just going to present a method that’s not a workaround at all. You just put the semiselect where you’d ordinarily put a SELECT. It involves no extra privileges or globals or file IO.
Example statements
CHECK TABLE c1, m WHERE `Msg_text` <> 'OK';
SELECT * FROM (DESCRIBE information_schema.tables) AS x ORDER BY 1;
SHOW COLLATION ORDER BY `Id` INTO OUTFILE 'tmp.txt';
SELECT `Type` FROM (SHOW COLUMNS IN Employees) AS x GROUP BY `Type`;
SELECT UPPER(`Name`) from (SHOW Contributors) as x;
SHOW ENGINES ORDER BY `Engine`;
(SELECT `Name` FROM (SHOW CONTRIBUTORS) AS x
UNION ALL SELECT `Name` FROM (SHOW AUTHORS) AS y)
ORDER BY 1;
CREATE TABLE engines AS SHOW ENGINES;
How does this work?
The client has to see where the semiselects are within the statement. That is easy, any client that can parse SQL can do it.
The client passes each semiselect to the server, and gets back a result, which ordinarily contains field names and values.
The client changes the field names and values to SELECTs, e.g. for SHOW CONTRIBUTORS the first row is
(SELECT 'Alibaba Cloud' AS `Name`, 'https://www.alibabacloud.com' AS `Location`, 'Platinum Sponsor of the MariaDB Foundation' AS `Comment")
and that gets UNION ALLed with the second row, and so on.
The client passes this SELECT to the server, and gets back a result as a select result set.
Or, in summary, what the client must do is: Pass the SHOW to the server, intercept the result, convert to a tabular form, send or SELECT … UNION ALL SELECT …; to the server, display.
However, these steps are all hidden. the user doesn’t have to care how it works.
Limitations
It requires two trips to the server instead of one. The client log will only show the semiselect, but the server sees the SELECT UNION too.
It will not work inside routines. You will have to CREATE TEMPORARY TABLE AS semiselect; before invoking a routine, in order to use the semiselect’s result set inside CREATE FUNCTION | PROCEDURE | TRIGGER.
Speaking of CREATE TEMPORARY TABLE AS semiselect, if there are VARCHAR columns, they will only be as big as the largest item in the result set.
It will not work inside CREATE VIEW.
Sometimes it will not work with nesting, that is semiselects within semiselects might not be allowed.
Some rare situations will expose the SELECT result in very long column names.
Try it today if you can build from source
On Linux this is easy — download libraries that ocelotgui needs, download ocelotgui, cmake, make. (On Windows it’s not as easy, sorry.) The source, and the README instructions for building, are on github.
After you’ve started up ocelotgui and connected to a MySQL or MariaDB server, there is one preparatory step: you have to enable the feature. (It’s not default because these aren’t standard SQL statements.) You can do this by going to the Settings|Statement menu and changing the Syntax Checker value to 7 and clicking OK. Or you can enter the statement
SET OCELOT_STATEMENT_SYNTAX_CHECKER = '7';
Now the feature is enabled and you can try all the examples I’ve given. You’ll see that they all work.
Of course it’s made available this way because the status is beta.
Try it a bit later if you can’t build from source
This will be available in executable form in the next release of ocelotgui, real soon now. If you have a github account, you can go to the github page and click Watch to keep track of updates.
Update: the release happened on 2024-03-18, source and executables are at https://github.com/ocelot-inc/ocelotgui.