August 2020 – SQL And Its Sequels

Month: August 2020

The SQL MOD Function

What should be the result of -11 % 4, or MOD(11, -4)?

This has been controversial and you will get different results in different languages. However, in SQL there is only one correct answer, the one that the SQL standard requires. Good news: your favourite SQL DBMS gives the correct answer. But what it is, and what the proof is, will need lots of explaining.

Modulus moduli modulo

Oxford English Dictionary entry for modulus: “… 2b A whole number used as a divisor in a system of arithmetic (modular arithmetic) in which integers having the same remainder when divided by this number are regarded as equivalent. Cf. congruent adj. 5, modulo prep., residue n. 3b.”

Oxford English Dictionary entry for modulo: “Origin: A borrowing from Latin. Etymons: Latin modulō, modulus. Etymology: < classical Latin modulō, ablative of modulus modulus n. Compare earlier mod prep. The preposition was first used in this sense in a Latin context by Gauss in 1801 (see note s.v. mod prep.). … a. With respect to a modulus of (a specified value). See modulus n. 2b.”

Carl Friedrich Gauss wrote Disquisitiones Arithmeticae in Latin. In Latin, moduli is the plural of modulus and modulo is the ablative. In this context the ablative modulo indicates “by means of the modulus”.

And so R = N MOD M means get a result R by applying to a number N “by means of the modulus M“. In SQL this is the standard function
MOD(N, M)
or the common operator
N % M.
I highlight the fact that the modulus here is M — sometimes I’ve seen hints that people think the modulus is the result of the function, but that’s not the original meaning. There are good explanations of “modular arithmetic” on the Khan Academy site and in Encyclopedia Britannica which use the term in the original way, and use the common illustration of a circular clock that has 12 hours marked on it:

If it is 7 o’clock, and you add 7 hours, then it is 2 o’clock. It doesn’t mean 7+7=2, it means 7+7 and 2 are “congruent” by means of modulus 12. So are (7+7+12) and 2, (7+7+12*2) and 2, and so on. The mod function should answer: what’s the congruent number that fits on the clock?

Anybody can see that, but real-world examples of a “negative clock” are harder to find. I can think of a clock that counts down before a rocket launch, but it doesn’t go in a cycle.Nevertheless, if it did, then the hours are not from 0 to 11, they are from 0 to -11. Now, if you add 7 hours to “minus 7 o’clock” or if you subtract 7 hours from “minus 7 o’clock”, arguably you should get a negative number, as no positive number fits on a negative clock.

On the other hand, people want to use this function to get remainders and remainders are normally positive numbers regardless what you use as a divisor. So what’s desirable?

The three main possibilities

Daan Leijen wrote “Division and Modulus for Computer Scientists” following some work by Knuth and Wirth and others. It is not in Latin. But it does have some mathematical symbols, which I’ll translate …

Assume divisor <> 0.
(1) the quotient is a member of the set of integers.
(2) Dividend = divisor times quotient + remainder.
(3) ABS(remainder) < ABS(divisor).

Those are the universal rules. The additional rules can be:

(a) “Truncate”, i.e. trunc(dividend / divisor) – (dividend * result)
(b) “Floor”, i.e. it’s the remainder of round(quotient) towards infinity)
(c) “Euclidean”, i.e. what Mr Leijen regards as correct, see his lengthy explanation, but he admits it’s rarely used.

The key illustration is Mr Leijen’s “Comparison of T-, F- and E-division”

D,d     qT,rT   qF,rF   qE,rE
---     -----   -----   -----
(+8,+3) (+2,+2) (+2,+2) (+2,+2)
(+8,−3) (−2,+2) (−3,−1) (−2,+2)
(−8,+3) (−2,−2) (−3,+1) (−3,+1)
(−8,−3) (+2,−2) (+2,−2) (+3,+1)

where T is Truncate, F is Floor, E is Euclidean, D is Dividend, d is divisor, q is quotient, r is remainder,

The exciting part is that when the Dividend is negative (-8,+3 or -8,-3), or when the divisor is negative (+8,-3 or -8,-3) the column with “rT”, i.e. remainder with Truncate, is not the same as the column with “rF”, i.e. remainder with Floor.

Modern C goes with (a) Truncate, which tends to be well supported by the processor. If you write and run this program:

#include <stdio.h>
void main()
{
  printf("%d %d %d %d\n", ( 11)%( 4), ( 11)%(-4), (-11)%( 4), (-11)%(-4));
}

You will see: 3 3 -3 -3. Ditto Java and many other languages. But Lua uses (b) Floor and there is an argument for it in “A small practical example where Lua’s % behavior is better than C’s”. Also Python uses (b) Floor; Guido van Rossum explains in “Why Python’s Integer Division Floors”.

Lisp has separate mod and rem functions. Maybe that was for the best. But it’s too late for us.

What SQL vendors do

A copy-and-paste from the ocelotgui client’s History widget:

mysql>SELECT ( 11)%( 4) , ( 11)%(-4) , (-11)%( 4) , (-11)%(-4);
OK 1 rows affected (0.5 seconds)
+------------+------------+------------+------------+
| ( 11)%( 4) | ( 11)%(-4) | (-11)%( 4) | (-11)%(-4) |
+------------+------------+------------+------------+
| 3          | 3          | -3         | -3         |
+------------+------------+------------+------------+

Same in MariaDB. Same in Tarantool. Same in PostgreSQL. Same in Oracle … but Oracle warns: this is not the “classical” modulus operation. Then Oracle illustrates what they call “classical” and, it’s obvious from that illustration, they mean Floor. IBM uses the same example.

Therefore the overwhelming opinion among SQL vendors is (a) Truncate.

What the standard says

There is no % operator. There is a MOD() function.

"
<modulus expression> ::=
  MOD <left paren>
      <numeric value expression dividend>
      <comma>
      <numeric value expression divisor>
      <right paren>
...
If <modulus expression> is specified,
then the declared type of each
<numeric value expression>
shall be exact numeric with scale 0 (zero).
The declared type of the result is the
declared type of the immediately contained
<numeric value expression divisor>.
...
9)If <modulus expression> is specified,
  then let N be the value of the immediately
  contained <numeric value expression
  dividend> and let M be the value of the
  immediately contained <numeric value
  expression divisor>.
  Case:
    a)If at least one of N and M is the null
      value, then the result is the null value.
    b)If M is zero,
      then an exception condition is raised:
      data exception—division by zero.
    c)Otherwise, the result is the unique exact
      numeric value R with scale 0 (zero) such
      that all of the following are true:
      i)R has the same sign as N.
      ii)The absolute value of R is less than
         the absolute value of M.
      iii)N = M * K + R for some exact numeric 
          value K with scale 0 (zero).

“

Look again at Mr Leijen’s universal rules, above. You’ll notice that the numeric-with-scale-0 requirement is equivalent to “(1) the quotient is a member of the set of integers.”, and you’ll notice that clause b) is equivalent to “Assume divisor <> 0”, and you’ll notice that clause c)iii) is equivalent to “(2) Dividend = divisor times quotient + remainder”, and you’ll notice that clause c)ii) is equivalent to “(3) ABS(remainder) < ABS(divisor).” So, except for clause c)i), all the rules are the rules that he described.

So what is -11 % 4?
It is the same as MOD(-11, 4).
Therefore -11 is the dividend and 4 is the divisor.
Therefore -11 is N and 4 is M.
a) does not apply because neither M nor N is NULL.
b) does not apply because M is not zero.
c) means for the expression N = M * K + R, we must solve for K and R.

Plugging in the values of N and M, we have
-11 = 4 * K + R.
Since R must have the same sign as N, R must be negative.
Therefore K must be negative!
To see why, plug in any value of K which is >= 0. For example:
-11 = 4 * 1 + -15
No. This violates c)ii) because ABS(-15) > ABS(4).
So now we know that
-11 = 4 * (K which is a negative number) + (R which is a negative number).
Let us try with K >= -1:
-11 = 4 * (-1) + (-7)
No. This violates c)ii) because ABS(-7) > ABS(4).
Okay, how about if K <= -3:
-11 = 4 * (-3) + (+1)
No. This violates c)i) because N is negative and R is positive.
Okay, how about K = -2:
-11 = 4 * (-2) + (-3)
This does not violate c)i) because N is negative and R is negative.
This does not violate c)ii) because ABS(-3) < ABS(4).
Great, so R = -3.

Thus it is established by example that the standard requires (a) Truncate.

Using non-standard extensions

Although in standard SQL the correct action for MOD(N, 0) is an error “division by zero”, not every vendor does that — they might return N or they might return NULL.

Although standard SQL MOD() is for integers, some vendors allow FLOAT or DECIMAL. And here is a trick: To get the scalar part of a DECIMAL number:
5.5 % 1
result should be .5.
Perhaps you are thinking “Nonsense! The correct way is
5.5 - FLOOR(5.5)
but I think it depends whether you want to preserve the sign.

Although the standard says that the result data type should be the same data type as M, some vendors will return a value with different precision and scale.

New ocelotgui version

Version 1.1.0 of ocelotgui was released on July 31. So head over to our github site now and download.

pgulutzan, August 4, 2020. Category: Standard SQL.