What is a data type?

I’d suppose that these statements are generally accepted:

A data type is a set of values.

So any value that belongs to that set is said to “have” or “belong to” that data type.

And any column which is defined as “of” that data type is going to contain values which have that data type, only.

For example a column C may be defined as of type INTEGER and will only contain values which belong to the set of integers.

And now for the exceptions, details, caveats, and errors.

What is synonymous with data type?

There are no synonyms; no other term should ever be used.

Not datatype. Oracle’s manual has “datatype” but IBM’s doesn’t and Microsoft’s doesn’t, and it’s not in the standard.

Not domain. C.J.Date (Date on Database, Writings 2000-2006) says

I observe that some people don’t understand even yet that domains and types are the same thing. The SQL standard muddies the water, too, because it used the term DOMAIN to mean something else …”

From your perspective the implication should be: you can’t use domain as a synonym because users of SQL products will think you mean something else.

Not class. Although some people uses class and type interchangeably, class can also mean a set of objects (rather than just a set of values), so it’s a less precise term.

What does the standard say

Move now to what’s in the SQL:2011 standard document. We find:

A data type is a set of representable values. Every representable value belongs to at least one data type and some belong to several data types.

This is restating the common definition — “a data type is a set of values” — without significant limitations. By saying “representable”, the standard seems to be merely accepting that a DBMS shouldn’t include what it can’t store or display. By saying “several data types”, it’s acknowledging that the value denoted by the literal 9999 could fit in either INTEGER or SMALLINT.

Does a data type include length, character set, etc.?

Suppose the definition of a column is

COLUMN_NAME CHARACTER(32)

Is the data type CHARACTER, or is the data type CHARACTER(32)?

This is a difficult question. The best answer I can give, which I’ll claim is what the SQL:2011 standard would say if it could, is: “CHARACTER(32)”.

Yes the name of the data type is CHARACTER. The standard only defines a small set of data type names — CHARACTER, CHARACTER VARYING, CHARACTER LARGE OBJECT, BINARY, BINARY VARYING, BINARY LARGE OBJECT, NUMERIC, DECIMAL, SMALLINT, INTEGER, BIGINT, FLOAT, REAL, DOUBLE PRECISION, BOOLEAN, DATE, TIME, TIMESTAMP, INTERVAL — and of course “CHARACTER” is on that list but “CHARACTER(32)” is not, it is not a name.

On the other hand — and I think the standard’s vocabulary choice is unfortunate — the data type’s name does not identify the data type! The data type is actually identified by a “data type descriptor” which is defined with the entire <data type> clause, and which, as well as name, includes precision, character set, etc.

That is not explicitly stated, so I’ll support the interpretation with several quotes which show that it’s implicit.

“A data type is predefined even though the user is required (or allowed) to provide certain parameters when specifying it (for example the precision of a number).”

“Two data types,T1 and T2, are said to be compatible if T1 is assignable to T2,T2 is assignable to T1, and their descriptors include the same data type name.”

“The data types “CHARACTER(n) CHARACTER SET CS1” and “CHARACTER(m) CHARACTER SET CS2”,where CS1 ≠ CS2, have descriptors that include the same data type name (CHARACTER), but are not mutually assignable”

“If a <data type> is specified, then let [the data type of a domain] be the data type identified by <data type>.”

“If <data type> is specified, then [the declared type of the column] is that data type.”

“If the maximum length of the subject data type is fixed [then default literals shall not be too long].”

“All data types in [destination strings] shall be character string, and all of them shall have the same character repertoire.”

“[If the result of an operation is] not exactly representable with the precision and scale of the result data type [then it’s an error].”

“A site declared with a character string type may be specified as having a collation, which is treated as part of its data type.”

… I quote them merely for the sake of showing that length and character set and repertoire are all part of data type, and that the <data type> clause is what specifies or identifies the data type, and that two different data types can both have the same data type name.

This does not mean that CHARACTER(32) is a subtype of CHARACTER, because the standard is clear that predefined data types cannot have subtypes or supertypes. (User-defined types can have subtypes or supertypes; however, for user-defined types questions like “is the data type character(32)?” don’t arise.)

This does not mean that saying “the data type is CHARACTER” is wrong, because, whenever we’re talking abstractly or generally, that’s appropriate.

It does mean that “data type is CHARACTER(32)” is much more correct when the definition is known.

What about funny-looking values?

If a data type is supposed to be a set of values, but NULLs are allowed, then what is the set?

Taking the line that NULL represents UNKNOWN, and using MySQL/MariaDB limits for SMALLINT, we could say that the set is {-32768, -32767, -32766, … 0, …, +32767} — NULL is somewhere in that enumeration, we just don’t happen to know where. But NULL could mean other things besides UNKNOWN, and sometimes it definitely does — think of ROLLUP. The standard just says “Every data type has a special value, called the null value, denoted by the keyword NULL.” The set then is {NULL, -32768, -32767, -32766, … 0, …, +32767}.

If the limits of the floating-point sets are supposed to be “based on the IEEE standard”, as the MySQL manual says they are, then where are NaN (not a number) and INF (pseudo-infinity)?

This has been the subject of feature requests like Bug#57519 Documentation for IEEE 754 compliance is missing from Daniël van Eeden, and Bug#41442 Support ieee754 by supporting INF and NaN from Mark Callaghan.

Bug#41442, incidentally, says “the SQL spec doesn’t allow [INF and NaN]”. I’d put it less severely — the standard effectively disallows them for all the standard predefined numeric data types including FLOAT or DOUBLE PRECISION (for example it says that an approximate-numeric literal must have the form <mantissa>E<exponent>). The standard does not forbid new implementation-defined data types.

Other specific effects on our favourite DBMSs

As for “what is a data type in MySQL (or by extension in MariaDB)?”, the
correct answer is: whatever the MySQL or MariaDB documentation says. There used to be arguments about this, but I think they’re all in the past now.

Equally, there’s no point arguing about how TINYBLOB etc. should have been specified in terms of maximum length or precision, or about how ENUM should have been a [VAR]CHAR plus a constraint. The more feasible feature request would be along the lines of: more complete support for the standard types.

pgulutzan, December 23, 2014. Category: Standard SQL.

About pgulutzan

Co-author of four computer books. Software Architect at MySQL/Sun/Oracle from 2003-2011, and at HP for a little while after that. Currently with Ocelot Computer Services Inc. in Edmonton Canada.

View all posts by pgulutzan

Stored Procedures: critiques and defences →

← General Purpose Storage Engines in MariaDB