When I worked for MySQL I saw frequent complaints that the UNIQUE constraint didn't stop users from inserting NULLs, multiple times. For example:
CREATE TABLE t (s1 INT, UNIQUE (s1));
INSERT INTO t VALUES (NULL);
INSERT INTO t VALUES (NULL); /* This does not cause a "unique constraint violation" error */
There are now eleven generously-commented bugs.mysql.com reports:
#5685, #6829, #7479, #8173, #9844, #17825, #19377, #25544, #27019, #27376, #66512. The essential points (along with observations about how we were stupid or deaf or lazy) are:
The first point is false; the second point depends what "major" means; the third point is a valid opinion.
I will now attempt to beat this subject to death with enough thoroughness that it will become dead.
Our example will be a table named t created thus:
CREATE TABLE t (s1 INT, UNIQUE (s1));
INSERT INTO t VALUES (NULL),(NULL);
So t looks like this:
+------------+ | s1 | +------------+ | NULL | | NULL | +------------+
Our question will be: is the constraint "UNIQUE (s1)" violated according to the SQL standard?
I'll use a recent draft of SQL:2011 because it's current and it's available for all to see, at:
jtc1sc32.org/doc/N1951-2000/32N1964T-text_for_ballot-FCD_9075-2.pdf.
The wording is about the same in all earlier versions of the standard since SQL-89, which didn't allow NULLs at all for UNIQUE constraints.
Section 11.7 says:
[Syntax Rules]
4) ... let SC be the <search condition>:
UNIQUE ( SELECT UCL FROM TNN )
where UCL means "Unique Column List" and TNN means "Table Name".
So for our example, search condition SC is "UNIQUE (SELECT s1 FROM t)".
[General Rules]
3) The unique constraint is not satisfied if and only if
EXISTS ( SELECT * FROM TNN WHERE NOT ( SC ) )
is True.
... Plugging in our values, that means
3) The unique constraint is not satisfied if and only if
EXISTS ( SELECT * FROM t WHERE NOT ( UNIQUE (SELECT s1 FROM t) ) )
is True.
So now we have to know how "UNIQUE(...)" is supposed to work.
Section 8.11 says:
If there are no two rows in T such that the value of each column in one row is non-null and is not distinct from the value of the corresponding column in the other row, then the result of the <unique predicate> is True; otherwise, the result of the <unique predicate> is False.
... where T is the table.
Now, apply the predicate "UNIQUE (s1)" to table t.
For either Row#1 or Row#2: Is the value of column s1 non-null?
Answer: NO. It is NULL.
Therefore the requirement "that the value of each column in one row is non-null" is met for zero rows in t.
But the <unique predicate> is only False if the requirement is met for two rows in t. Therefore the result of the <unique predicate> is True.
In other words,
UNIQUE (s1) is True.
Therefore
NOT (UNIQUE (s1)) is False.
Therefore
SELECT * FROM TNN WHERE NOT (UNIQUE (t)) returns zero rows.
Therefore
EXISTS ( SELECT * FROM TNN WHERE NOT ( UNIQUE (t) ) ) is false.
But Rule 3) says that the unique constraint is not satisfied if and only if that EXISTS clause is True.
Therefore the constraint is satisfied.
And when a constraint is satisfied, it is not violated.
Thus, according to the rules for <unique constraint definition>, our example table violates no constraint.
It's tedious to work through this because there are multiple negatives, but it is not ambiguous.
Why, then, do some people have trouble? Partly because they look at a different statement in the introductory sections. Section 4.17 Integrity constraints says:
A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.
And I have to admit that it's not at all obvious what that's supposed to mean. But, since you have now seen what is in the later and more detailed sections, you should read it as "no two rows have unique-column values which are the same (an undefined word which probably means equal) and which are also non-null".
Trudy Pelzer and I wrote this on page 260 of our book, SQL Performance Tuning:
DBMS | Maximum number of NULLs when there is a UNIQUE constraint |
IBM (DB2) | One |
Informix | One |
Ingres | Zero |
InterBase | Zero |
Microsoft (SQL Server) | One |
MySQL | Many [although the BDB storage engine was an exception] |
Oracle | Many |
Sybase | One |
DB2 has an optional clause which can force standard behaviour.
Oracle is only allowing multiple NULLs because it doesn't put NULLs in indexes; if we were talking about a multiple-column constraint, Oracle would be non-standard.
The book didn't cover PostgreSQL and Access and Firebird, which all allow "Many" NULLs. A newer entrant, NuoDB, allows only one NULL (I told them that's non-standard but they didn't believe me).
Most of the complainers were apparently thinking of Microsoft SQL Server. So it's interesting that on Microsoft's discussion boards the complaints are (spoiler alert!) that SQL Server doesn't follow the standard and should be more like other DBMSs. See, for example, NULL in unique index (where SQL Server gets it wrong) and Change UNIQUE constraint to allow multiple NULL values.
According to SQL Server Magazine, in 2008,
ANSI SQL supports two kinds of UNIQUE constraints -- one that enforces uniqueness of NULLs just like with known values, and another that enforces uniqueness of known values but allows multiple NULLs. Microsoft SQL Server implemented only the former.
As we've seen, that's false, and so is another claim, that "Informix and Microsoft SQL Server follow the other interpretation of the standard." But perhaps Microsoft's stories help explain people's beliefs.
An unfortunate defence of the allow-many-NULLs behaviour is that "Null values are not considered equal". That's what's happening in effect in this particular case, but it's not a general or useful rule.
Therefore, the UNIQUE rules do not follow automatically from how NULLs behave elsewhere in SQL. That is why the standard had to add wording to specify that the uniqueness predicate applies only for "non-null" values.
So, regarding what looks more sensible, it seems that the complainers have a point.
It is better, now, to carry on as before. The current behaviour is the de facto and the de jure standard. It is neither a bug nor undesirable.
But would it be even better if MySQL or MariaDB allowed an option? The DB2 syntax to allow UNIQUE with many NULLs is:
CREATE TABLE t (s1 INT UNIQUE WHERE NOT NULL)
so presumably the way to specify UNIQUE with one NULL would be:
CREATE TABLE t (s1 INT UNIQUE WHERE NULL OR NOT NULL)
but it would be easier to understand if we said
CREATE TABLE t (s1 INT UNIQUE AND MAXIMUM NUMBER OF NULLS = 1).
Anyway, it would be possible. But a nice first step would be to acknowledge that it is a feature request rather than a bug.
Copyright (c) 2013 by Ocelot Computer Services Inc. All rights reserved.
Hi Peter, I'm just stopping by to say hi!
It has been a while since I read your articles - it's great to do so again 🙂 You're skills in dissecting topics like this, and your thoroughness in reasoning about them is simply delightful to witness. Kudos!
warm regards,
Roland Bouman
Roland: thank you very much, and it's great to see that you're still around too.
Well, all through the years with MySQL, this is exactly how I looked at it, but many disagreed or didn't understand the issue. My reasoning was more simple than yours though (but I have looked it up in SQL-99): Since NULL isn't a value, rather it's the absence of a value, then all comparisons fail.
Or: Here is a parking spot, and only one car can park there at the time. So if car A is there, car B cannot be there at the same time. But when there is an absence of a car in the parking spot, then both A and car B isn't there. So if the value of the spot is NULL (no car parked), then both statements "Car A isn't here" and "Car B isn't here", actually any car, isn't there. Hmm this started out as a god example, but rereading it I don't know anymore. Also, this is not true if this is Larrys parking spot. If your car A is in that spot, then it will probably be shot at by fighter jet Mig-21 B.
Actually comparisons don't "fail", they succeed, the result is unknown. So, as I try to suggest in the post, it's not NULL=NULL comparisons that establish correct behaviour, it's what the standard says about this specific situation.
Keep in mind that NULL is not a value! It is a flag that says, "This item is not a value."
That's the simple test I keep in mind to help me through such things. Too many people have been polluted by C-like languages, that often define NULL as the same as zero. I prefer the Smalltalk/SIMULA heritage, where the nil object is equivalent to itself, but not equal to anything else.
Hi,
Thanks for an interesting post.
I always regarded NULL as no information/not defined and to me it seems SQL is of the same opinion since NULL = NULL is false, (something I many times have had problems to explain to others).
My opinion is; it is more convenient to allow many NULLs in unique relations than not, i.e. do not put NULLS in unique indexes
Mr Johansson and Mr Steinman: Thanks for the comments, but the statement "NULL = NULL is false" is wrong, and it has no effect on this issue whether NULL is called a value so let's let that be a topic for another day. [ Update: I originally wrote 'false' rather than 'wrong' but that was too glib; the point though is addressed in the original post. ]
Peter, typos:
You say
"But the unique predicate is only True if the requirement is met for two rows in t. Therefore the result of the unique predicate is False."
Should be
"But the unique predicate is only False if the requirement is met for two rows in t. Therefore the result of the unique predicate is True."
You say
"In other words, UNIQUE (s1) is False. Therefore NOT (UNIQUE (s1)) is True."
Should be
"In other words, UNIQUE (s1) is True. Therefore NOT (UNIQUE (s1)) is False."
The rest is correct, "SELECT * FROM TNN WHERE NOT (UNIQUE (t)) returns zero rows." precisely because NOT(UNIQUE(t)) is False.
Thanks for the observation, and thanks for kindly saying I made a mere typo when I wrote False instead of True, and vice versa. I have edited the post.