Month: July 2014


The BINARY and VARBINARY data types

MySQL’s support of the BINARY and VARBINARY data type is good, and the BINARY and VARBINARY data types are good things. And now the details. What applies here for MySQL applies for MariaDB as well.

Who supports VARBINARY

There’s an SQL:2008 standard optional feature T021 “BINARY and VARBINARY data types”. Our book has a bigger description, but here is one that is more up to date:

DBMS Standard-ish? Maximum length
DB2 for LUW No. Try CHAR FOR BIT DATA. 254 for fixed-length, 32672 for variable-length
DB2 for z/OS Yes. 255 for fixed-length, 32704 for variable-length
Informix No. Try CHAR. 32767
MySQL Yes. constrained by maximum row length = 65535
Oracle No. Try RAW. 2000 sometimes; 32767 sometimes
PostgreSQL No. Try BYTEA. theoretically 2**32 – 1
SQL Server Yes. 8000 for fixed-length. 2**31 -1 for variable length

Standard Conformance

Provided that sql_mode=strict_all_tables, MySQL does the right (standard) thing most of the time, with two slight exceptions and one exception that isn’t there.

The first exception:

CREATE TABLE t1 (s1 VARBINARY(2));
INSERT INTO t1 VALUES (X'010200');

… This causes an error. MySQL is non-conformant. If one tries to store three bytes into a two-byte target, but the third byte is X’00’, there should be no error.

The second exception:

SET sql_mode='traditional,pipes_as_concat';
CREATE TABLE t2 (s1 VARBINARY(50000));
CREATE TABLE t3 AS SELECT s1 || s1 FROM t2;

… This does not cause an error. MySQL is non-conformant. If a concatenation results in a value that’s longer than the maximum length of VARBINARY, which is less than 65535, then there should be an error.

The exception that isn’t there:
The MySQL manual makes an odd claim that, for certain cases when there’s a UNIQUE index, “For example, if a table contains ‘a’, an attempt to store ‘a\0’ causes a duplicate-key error.” Ignore the manual. Attempting to insert ‘a\0’ will only cause a duplicate-key error if the table’s unique-key column contains ‘a\0’.

The poem Antigonish desccribed a similar case:

Yesterday, upon the stair,
I met a man who wasn’t there.
He wasn’t there again today,
I wish, I wish he’d go away…”

The BINARY trap

Since BINARY columns are fixed-length, there has to be a padding rule. For example, suppose somebody enters zero bytes into a BINARY(2) target:

CREATE TABLE t4 (s1 BINARY(2));
INSERT INTO t4 VALUES (X'');
SELECT HEX(s1) FROM t4;

… The result is ‘0000’ — the padding byte for BINARY is X’00’ (0x00), not X’20’ (space).

There also has to be a rule about what to do for comparisons if comparands end with padding bytes.

CREATE TABLE t5 (s1 VARBINARY(2));
INSERT INTO t5 VALUES (X'0102');
SELECT * FROM t5 WHERE s1 = X'010200';

… This returns zero rows. It’s implementation-defined whether MySQL should
ignore trailing X’00’ during comparisons, so there was no chance of getting
it wrong.

The behaviour difference between BINARY and VARBINARY can cause fun:

CREATE TABLE t7 (s1 VARBINARY(2) PRIMARY KEY);
CREATE TABLE t8 (s1 BINARY(2), FOREIGN KEY (s1) REFERENCES t7 (s1));
INSERT INTO t7 VALUES (0x01);
INSERT INTO t8 SELECT s1 FROM t7;

… which fails on a foreign-key constraint error! It looks bizarre
that a value which is coming from the primary-key row can’t be put
in the foreign-key row, doesn’t it? But the zero-padding rule, combined
with the no-ignore-zero rule, means this is inevitable.

BINARY(x) is a fine data type whenever it’s certain that all the values
will be exactly x bytes long, and otherwise it’s a troublemaker.

When to use VARBINARY

VARBINARY is better than TINYBLOB or MEDIUMBLOB because it has a definite
maximum size, and that makes life easier for client programs that want to
know: how wide can the display be? In most DBMSs it’s more important that BLOBs can be stored separately from the rest of the row.

VARBINARY is better than VARCHAR if there should be no validity checking.
For example, if the default character set is UTF8 then this is illegal:

CREATE TABLE t9 (s1 VARCHAR(5));
INSERT INTO t9 VALUES (0xF4808283);

… but this is legal because character set doesn’t matter:

CREATE TABLE t10 (s1 VARBINARY(5));
INSERT INTO t10 VALUES (0xF4808283);

(I ran into this example on a SQL Server forum where the participants display woeful ignorance of Unicode).

And finally converting everything to VARBINARY is one way to avoid the annoying message “Invalid mix of collations”. In fact the wikimedia folks appear to have changed all VARCHARs to VARBINARYs back in 2011 just to avoid that error. I opine that the less drastic solution is to use collations consistently, but I wasn’t there.