Copyright (c) 2020 by Peter Gulutzan. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
All references to "this program" or "software" mean "this guide", the Descriptive SQL Style Guide. The copy of the GNU General Public License is the final section of this guide.
Version 0.1.0 2020-04-30
Copied from github to ocelot.ca with permission of the author. Please place any feedback on https://github.com/pgulutzan/descriptive-sql-style-guide/issues.
This is a descriptive style guide, like a dictionary that tells you what is common usage, with citations. To find common usage I looked at vendor manuals, expert blogs, and prescriptive style guides (guides that tell you what you should do). Status = alpha.
I think that anyone might use this to decide what code conventions are best for their own organizations, based on orthodoxy, logic, style used for another language's style guide, and chosen DBMS vendor.
The prescriptive style guides are the ones on github, in English, that have more than 100 stars at time of writing:
Konstantin Taranov,
SQL Server Name Convention and T-SQL Programming Style,
841 stars.
Simon Holywell,
SQL Style guide.
811 stars. Partly based on the book "Joe Celko's SQL Programming Style".
Matt Mazur,
Mazur's SQL Style Guide.
441 stars. Mr Mazur says his guide is "opinionated".
Fred Benenson,
KickStarter, SQL Style Guide
240 stars. Some PostgreSQL emphasis.
Philipp Salvisberg,
"Trivadis PL/SQL & SQL Coding Guidelines"
62 stars + favourable reviews for example by Steven Feuerstein.
Also available as
pdf. Oracle emphasis.
Mark Donnelly,
SQL Style guide.
119 stars. Last updated in 2013.
For the "Names" sections I also consult these non-github web pages because they have details about specific object types:
Tim Hall,
Oracle naming conventions.
Peter Gulutzan,
"SQL Naming Conventions".
Sehrope Sarkuni,
"How I Write SQL, Part 1: Naming Conventions".
Craig Mullins,
"On DB2 Naming Conventions".
Also at
datavail.com
and
db2portal.
Jeffrey Keller,
"An Unemotional Logical Look at SQL Server Naming Conventions".
For the "Format" sections I also consult this book chapter:
Phil Factor.
"SQL Server Team-Based Development. Chapter 1: Writing Readable SQL".
Section = Code Layout.
The vendor manuals are:
Oracle Release 20 or 19 SQL Language Reference.
DB2 manual 11.5.
SQL Server: 2019.
MySQL: 8.0.
MariaDB 10.
Tarantool 2.3.
Although I am a former or current employee of some of these companies,
I am only using information that is publicly available.
For SQL Developer's "code editor" also called "worksheet" I rely on www.thatjeffsmith.com and totierne.blogspot.com.
Finally, I will sometimes quote the ANSI/ISO SQL standard, and many bloggers whom I will identify as they come up.
Inconsistency happens so I try to look for multiple examples.
In following sections I will identify style guides by the author's surname, and identify vendor manuals by the product's name.
My definition of SQL style is: consistently choosing words or formats that do not affect what the DBMS returns, often describable as a list of rules that formalize what words or formats to choose, for example saying INTEGER rather than int. Or, more simply: what to say when there are two ways to say (almost) exactly the same thing.
I do not bother with rules that only affect one vendor.
I begin most sections with the word Choice: followed by settings that you can pick. Then I describe what the choices mean, then what prescriptive guides say, what I happen to have seen in vendor manuals (warning: sampling may be affected by chance), and what some bloggers or other sources might have said if I happened to notice them.
I want to avoid looking biased for one style or one vendor, so I do not use a consistent style in examples.
Choices
Keywords and letter case
Semicolons
Not-equal operator
Unnecessary keywords and operators
SELECT *
ORDER BY ordinal
Comments
Multiple-line comments
New or old style inner join
Data types
Literals
Long literals
Names (meaning)
Names (letter case)
Names (legal characters
Names (length)
Names (delimiter)
Names (qualifier)
Names (prefix or suffix)
Names of tables
Names of temporary tables
Names of views
Names of columns
Names of aliases and range variables
Names of constraints
Names of indexes
Names of triggers
Names of sequences
Names of functions (or: names of routines)
Names of savepoints
Names of collations
Names of variables or parameters
Dynamic SQL
Format terminology
Format choices terminology
Format symbols
Format parentheses
Format comments
Format line length
Indenting units
Indenting
Format clauses deciding what is a clause
Format clauses by indenting
Format clauses by right-aligning keywords
Format clauses by aligning contents
Format lists
Format conditions
Format subqueries
Format WITH
Format joins
Format INSERT
Format UPDATE
Format CREATE TABLE
Format CREATE VIEW
Format CREATE PROCEDURE or CREATE FUNCTION
Format CREATE TRIGGER
Format CASE expression
Format blocks the usual way
Format blocks with analogies to other guides and choices
Format DECLARE
Format overflow
Formatters or pretty printers
Contributors
GPL Version 2 License
Choice: worry about style?
I will put "Choice: ...?" questions in most sections.
You do not need to care if ...
You have a good-enough tool that does the job for you.
You only work with SQL occasionally, and mostly for yourself.
You see that fussing about appearance should be low priority.
You have read one of the prescriptive guides and are satisfied.
If any of those things apply to you, good, you are in the
majority and now you can go off and find better things to do.
You only need to care if ...
You want to come up with a better tool.
You are in a team whose boss insists on consistency.
You would like to know what, if anything, justifies the rules.
If any of those things apply to you, too bad,
now you will have to check "worry about style = yes"
and make similar checks for all the "Choice:" matters
that follow.
Choices are not inheritable because I do not classify well.
Choice: Keywords upper case or lower case?
Google Ngrams shows relative popularity of words and phrases in books, and it distinguishes upper versus lower case.
I used it for terms that are likely to appear only in database books:
CREATE TABLE is more popular than create table.
SELECT DISTINCT is more popular than select distinct.
TO SAVEPOINT is more popular than to savepoint.
EXECUTE IMMEDIATE is more popular than execute immediate.
INTEGER PRIMARY KEY is more popular than integer primary key.
So saying "in general keywords are upper case" has some evidence.
Similarly, a poll by Lukas Eder resulted in a majority for SELECT in upper case.
Exception #1: when a keyword is not being used as a keyword. For example, IMMEDIATE is in the SQL standard's "non-reserved word" list, so I can use it as a column name. I have trouble believing that the guides which say "capitalize keywords" mean that I should capitalize IMMEDIATE in that circumstance. I think they really meant "capitalize words that are not identifiers", which coincidentally excludes reserved words.
Exception #2: when trying to avoid SHOUTING or when trying to be like most other languages. This might be reasonable when entering statements with a client that highlights properly. See this argument on stackoverflow.
Exception #3: data type names. See later section = Data types.
Prescriptive guides:
Benenson, Donnelly, Holywell, Salvisberg say: upper case.
Taranov says: upper case, except for data types.
Mazur says: lower case ("It's just as readable as uppercase SQL and you will not have to constantly be holding down a shift key.")
Vendor manual examples:
MySQL, MariaDB, Oracle, SQL Server, Tarantool: upper case for keywords.
DB2: upper case for all words, whether or not they are keywords.
Oracle SQL Developer has an option "Case change" with choices "UPPER", "lower", "keep unchanged", and "Init cap"
(which presumably is intended for identifiers more than keywords).
Bloggers:
The Drupal manual says: "Make SQL reserved words UPPERCASE. This is not a suggestion. Drupal db abstraction commands will fail if this convention is not followed."
Ian Hellström in "Execution Plans" notes that Oracle uses a hash of the statement text to see whether it is in the library cache. Thus even a tiny difference between two statements can cause a cache miss and affect performance.
According to Don Burleson capitalization of keywords does not matter, but according to Morton and Osborne and Sands it does matter. Probably they're looking at different versions. Anyway, that means that maybe sometimes in theory inconsistency will affect performance.
A TPC-H example has all lower case.
Choice: End all statements with semicolons?
In standard SQL ";" is not part of a statement, it is a signal that the statement is over, so it is required for direct SQL (such as SQL typed to a client program) and within BEGIN ... END, but not dynamic SQL (such as execute immediate, or SQLExecDirect in the call level interface). Thus if a JDBC driver rejects a semicolon-terminated statement with "SQLSyntaxErrorException: ORA-00911: invalid character", which has been known to happen, it is within its rights. Also see this cartoon.
But all the vendor manuals either show that ";" is an optional part of a statement
(as in
this SQL Server bnf), or show it in examples.
The key problems are:
(1) if you do not know when statements end, you cannot run a script containing multiple statements
(2) it is hard to know when statements end, for example "SELECT 5 BEGIN" is a valid single statement
(in a certain dialect where BEGIN is not a reserved word and [AS] is not required),
but simple parsers could easily think it is more than one.
Of course, a client might get confused and think that <semicolon> <newline> is end-of-statement (this has happened with Toad [] within comments, and with other clients within BEGIN ... END blocks). However, vendors have options for changing what the client thinks is the terminator -- SET TERMINATOR for DB2, DELIMITER for MySQL/MariaDB, SET SQLTERMINATOR for Oracle, \set delimiter for Tarantool. Of course, I like to emphasize that there is a cleaner solution -- use a client with a recognizer that supports client-side predictive parsing.
It is not true that Microsoft has deprecated the feature "Not ending Transact-SQL statements with a semicolon". Read the fine print of Microsoft's Deprecated Notice. It says that the feature will be "supported in the next version" although it "will be deprecated in a later version", which they have been saying for over a decade and lack of ; has been known to cause an error, but they probably are holding back because there is so much installed code.
Prescriptive guides: Taranov says end with ";". Factor says "generally speaking" end with ";"
Vendor manuals:
Usually ";" is not in the BNFs or railroad diagrams (although Oracle does show it and Microsoft does show it as [;] i.e. optional); however, most manuals' examples end with semicolons (except DB2's).
Bloggers:
Dan Guzman says end with ";".
Don Burleson says it depends.
Jeff Smith says it depends.
Factor says "Use the semicolon to aid the reading of code, even where SQL syntax states that it is only optional."
The best explanation of SQL Server anomalies that I have seen is in "Rules of SQL formatting – Terminating SQL statements with semicolons".
Choice: Not-Equal Operator: <> or != or ^= or ¬=?
The main variants are != and <> which are accepted by all major vendors.
DB2 and Oracle also support ^= and ¬=. ¬ is the Unicode standard symbol U+00AC "NOT SIGN" but since it is not in 7-bit ASCII. it will not be recognized if you have the wrong code page. Accordingly Oracle says "Some forms of the inequality condition may be unavailable on some platforms" and IBM says "A logical not sign (¬) can cause parsing errors in statements passed from one DBMS to another."
!= is found in C, Java, Perl, Python and the like. The Python manual used to accept <> along with the warning "The <> spelling is considered obsolescent." ... now it does not mention <> at all. The idea is that ! is a "not sign" (if you know C and do not know Unicode) and = is an equals sign, so this should be easy reading,
<> is found in Access, BASIC, Pascal, and Rexx (remember them?). The idea is that <= means "less than or equal", so in a consistent world <> means "less than or greater than".
<> is standard but != is more common in Oracle examples that I have seen.
That reminds me that once upon a time someone reported a case where != was faster than <> and spawned many answers.
Prescriptive guides:
Mazur says: Use != over <> ... Simply because != reads like "not equal" which is closer to how we'd say it out loud."
Benenson uses != in an example.
Vendor manuals:
Oracle's inequality test example uses !=
MySQL's example uses both != and <>
SQL Server example uses both != and <>
DB2's example uses <>
Bloggers:
Itzik Ben-Gan says "This case should be a nobrainer: go for the standard one!"
An iBATIS thread shows a user getting into trouble because XML complains about an SQL statement that contains <> and seeing two proposed solutions: use cdata, or switch to !=.
Choice: Add unnecessary keyword INNER in INNER JOIN?
Choice: Add unnecessary keyword OUTER in LEFT OUTER JOIN?
Choice: Add unnecessary keyword INTO in INSERT INTO?
Choice: Add unnecessary parentheses around low-precedence expressions?
Choice: Add unnecessary keyword COLUMN for ALTER TABLE ADD?
This is pleonasm, which Cambridge defines as "the use of more words than are needed to express a meaning, done either unintentionally or for emphasis;" According to Merriam-Webster pleonasm means "the use of more words than those necessary to denote mere sense" and "Pleonasm is commonly considered a fault of style, but it can also serve a useful function."
That is perhaps the reason that I can not find prescriptive guides or vendor examples that advocate SELECT ALL, or UNION DISTINCT or ORDER BY x ASC.
But saying Add unnecessary keywords and operators = yes should have these three effects, which are activated in at least one prescriptive guide or blog:
[INNER] before JOIN
Mazur says: "Include inner for inner joins", probably the reasoning again is that it makes something easier to read.
Examples in vendor manuals seem to show INNER more often than not.
[INTO] after INSERT
The word INTO is optional in MySQL/MariaDB and SQL Server,
but compulsory in DB2, Oracle, and the SQL standard.
Holywell says:
"Why you would willingly choose a proprietary solution when a standard SQL method already exists is beyond me."
Parentheses around OR conditions
That is, x OR y becomes (x OR y).
The reasoning is that, if there is an AND in the vicinity,
it will take precedence -- but who remembers precedence rules?
As the
Java style guide says:
(regarding mixed operators)
"you should not assume that other programmers know precedence as well as you do".
But add-unnecessary-keywords-and-operators = yes does not mean:
[TRANSACTION] or [TRAN] or [WORK] after COMMIT.
This is the opposite of the INSERT INTO phenomenon --
in this case not every vendor supports the unnecessary word.
So, naturally, you will not see the same usage of these keywords in all vendor manuals.
Prescriptive guides:
Salvisberg says: "Never initialize variables with NULL. ... Variables are initialized to NULL by default." (but probably he does not mean DEFAULT NULL).
Holywell says: "Keep code succinct and devoid of redundant SQL—such as unnecessary quoting or parentheses or WHERE clauses that can otherwise be derived."
Vendor manual examples:
Oracle: does not allow LEFT JOIN with OUTER
MariaDB, MySQL: show LEFT JOIN without OUTER
Bloggers:
"Modern SQL Style Guide", explaining why "left outer join" is bad, says: "`outer` is an unnecessary optional keyword".
Scott Czepiel says: Omit the “outer” when doing a left join ... Likewise omit “inner” from inner joins
Choice: Allow SELECT *?
The main complaint about SELECT * is that the definition of the table might change. However, that is true even if we select with column names, if ALTER can be used to change a column definition.
To be consistent, Allow SELECT * = no should also affect other syntax that is based on assumptions about stable table structures, such as INSERT without INTO, or NATURAL JOIN.
And maybe it is not truly pleonastic to specify column names,
if there is a chance that table definition will change someday.
So maybe SELECT * can be replaced by
SELECT column-name [, column-name ...]
and maybe INSERT INTO table-name can be replaced by
INSERT INTO table-name (column-name [, column-name ...]).
Vendor manual examples:
Since I expect that SELECT * will be used for examples, I did not check.
Bloggers:
Factor says: use SELECT * for "ad-hoc" work, not "production" work.
Choice: Allow ORDER BY ordinal?
"SELECT ... ORDER BY 1;" was once legal in standard SQL but became illegal in the 1999 version. However, all major vendors still support it.
MySQL and Tarantool and DB2 with NPS-mode even support "GROUP BY 1".
"ORDER BY ordinal" would be convenient for
SELECT very_long_expression, very_long_expression FROM ... ORDER BY 2;
However, all major vendors support aliases too.
ORDER BY ordinal" would be convenient for
VALUES ('b', 'c') UNION ALL VALUES ('x', 'y') ORDER BY 2;
However, not all vendors support ORDER BY in such contexts.
Prescriptive guides:
Taranov, Salvisberg say: specify columns.
Salvisberg says: "Always specify column names instead of positional references in ORDER BY clauses."
Vendor manuals:
Oracle: shows ORDER BY position as well as ORDER BY name
DB2: has examples of ORDER BY name only.
SQL Server: says "Avoid specifying integers in the ORDER BY clause as positional representations of the columns in the select list."
Bloggers:
Claire Carroll posted "Write better SQL: In defense of group by 1" ... the gitlab folks would probably approve.
Choice: /* comment */ or --comment?
The main variants of comments are:
bracketed (start with /* and end with */) also called multi-line or block or C-style or slash star,
simple (start with -- and end with newline) also called single-line or line or double-dash.
MySQL allows replacing -- with # for simple comments but that is rare.
Although the SQL standard said that simple comments are mandatory and bracketed comments are optional, nowadays all vendors support both types. Oddly, there are cases where bracketed comments are legal but simple comments are illegal, for example with DB2's DSNTIAUL and in standard PREPARE texts (General Rule 6 = "If P [the contents of the SQL statement variable] does not conform to the Format, Syntax Rules, and Access Rules of a <preparable statement>, or if P contains a <simple comment> then ... an exception condition is raised: syntax error or access rule violation."). I do not know why. Maybe it has something to do with the fact that some types of SQL injection attack depend on simple comments, as described on netsparker.com. Or maybe it has something to do with caching, as a DB2 document hints that only statements with bracketed comments will get into the dynamic cache.
There are documentation generators that have SQL support and that require a particular comment style, often starting with /**.
The other style question is: where to put comments?
ApexSQL has advice for where to put bracketed comments inside stored procedures. but I did not find specific advice about location in other documents (except Don Burleson's blog). Vendor manual examples show comments preceding the statement (on a separate line), or comments following a clause (on the same line, at the end of the line), but I found none that show comments following the statement or comments within a line.
Prescriptive guides:
Taranov says "Always use multi-line comment".
Holywell says "Use the C style opening /* and closing */ where possible" and
"Avoid nesting comments" (with an example of a bracketed comment inside a simple comment).
Salvisberg says: "Inside a program unit only use the line commenting technique --".
Vendor manuals: show both styles, no apparent preference.
Oracle SQL Developer has an option "Put -- comments between /* ... */" but not the other way around, which I guess is a hint about where their sentiments lie.
Bloggers:
Don Burleson says: use simple comments except inside 3GL programs that use the Oracle Precompilers.
Choice: multiple-line comment style 1, or 2, or 3, or 4, or 5?
Choice: precede multiple-line comment with a blank line?
Choice: follow multiple-line comment with a blank line?
Bracketed /* ... */ comments can take up multiple lines. Usually they precede what they are commenting on. It may be hard to reformat them so it is good if each line is short.
I have seen or read about five styles. Usually they are preceded or followed by newlines.
/* This comment is Style 1: Align start and end. */ /* * This comment is Style 2: Align asterisks. */ /* ** This comment is Style 3: Lines start with **. */ /** * This comment is Style 4: Comment starts with **. */ /* This comment is style 5: Comment ends on line end. */
Style 3 is recommended (but not followed) by the SQL Server manual.
Style 4 is a signal to tools like Doxygen that this is for documentation as mentioned earlier.
Vendor examples:
Oracle: style 5.
DB2: style 2.
SQL Server: style 1.
MySQL: style 1.
MariaDB: style 5.
Bloggers:
Apex SQL has recommendations for what should be in the comment, and options for inserting empty lines before/after the comment.
Choice: join syntax: old-style or new-style?
"FROM a, b [WHERE join-condition etc.]" is old style, also called FROM-join.
"FROM a ... JOIN b ON join-condition" is new style, also called ANSI or ISO or SQL92 join.
Back in 2005 Doug Burns described
his pro-and-con thoughts, which I hereby distort:
Pro: changing to an OUTER join is easy because it has almost the same syntax
Pro: it is more unlikely that you will miss a condition and go Cartesian
Pro: mixing up the join-condition with other conditions in the WHERE clause might mislead,
Con: new style is more verbose
Con: old style has a long-established base.
Some bloggers give the wrong impression that Microsoft has deprecated old style joins and/or they are non-standard. In fact Microsoft only deprecated old style outer joins (see this red-gate post and this Microsoft post). In fact the syntax "FROM table_name1 , table_name2" is legal according to the ISO/IEC ("ANSI") SQL:2016 rules for <table reference list> and Microsoft is aware of that.
Prescriptive guides:
Benenson, Salvisberg, Taranov say: use new style.
Vendor manuals:
DB2: the "Inner join" examples mix old and new style without recommendations.
Oracle: the "Using Join Queries: Examples" show only old style. Remember that we're only talking about inner joins here -- for outer joins, Oracle recommends against using the old style and its Oracle-specific join operator.
MariaDB: recommends new style.
SQL Server: says new style is "recommended".
Tarantool: has an introductory section using only old style.
Bloggers:
Salesforce apparently is in the group that thinks that old style is not standard.
Joe Celko says:
"But what I found is that people who use the traditional notation think in sets,
while those who use the infix notation are stuck with a procedural linear mindset."
He also says:
"Weaker SQL programmers use INNER JOIN, since it is sequential and a familiar, procedural infix binary operator.
Stronger SQL programmers use the WHERE syntax because it is set-oriented, obeys the law of proximity and is an n-ary operator."
Choice: abbreviated data type names?
Choice: upper case or lower case data type names?
In standard SQL you can create a column with data type int, but in information_schema it will show up as INTEGER -- the canonical form is always the unabbreviated word in upper case. Unfortunately not every vendor converts canonically but this shows that for the standard INTEGER is a better word.
However, Aaron Bertrand (who wrote a series of blog posts touching on SQL style) switched to int, because of a SQL Server quirk. In SQL Server, if you are querying a system table, it makes a difference what collation you used when you originally defined the whole database. It can mean that you will not find columns defined as 'int' when you search for columns defined as 'INT' or 'INTEGER'. (In SQL Server built-in data type names are not not reserved words.)
For another indicator, again using Ngrams ...
INTEGER PRIMARY KEY is more popular than INT PRIMARY KEY
but varchar is more popular than VARCHAR or CHARACTER VARYING.
Although the INT-versus-INTEGER questions are not settled, there is no question that CHAR(n) is preferred instead of CHARACTER(n). By the way, if you did not know how to pronounce CHAR, see this blog post. For the other data type abbreviation -- DEC versus DECIMAL -- I found no guidance.
Sometimes I have seen that the length, and/or precision and scale, is skipped when it is default. This is probably related to the Pleonasm question -- should unnecessary words be skipped?
Prescriptive guides:
Salvisberg says: "Avoid declaring NUMBER variables, constants or subtypes with no precision."
Holywell says: "It is best to avoid the abbreviated keywords and use the full length ones where available"
(but he is not talking about data types)
Taranov says: "Always specify a length to any text-based data type such as varchar, nvarchar, char, nchar:"
Vendor Manuals:
I just looked at the data types that were used in the manuals' CREATE TABLE examples.
DB2: CHAR(n), CHAR (n), CLOB, CLOB(n), DATE, DECIMAL(n,n), DOUBLE, INTEGER, SMALLINT, TIMESTAMP, VARCHAR(n) (I thought it was interesting tht DOUBLE was used although DOUBLE PRECISION is standard)
Oracle: BLOB, CHAR(n), CLOB, DATE, NUMBER,, NUMBER(n), NUMBER(n,n), number(n), NCLOB, VARCHAR2(n), varchar2(n) (I thought it was interesting that CHAR(n) was used when n <= 2 and NVARCHAR(n) was used when n > 2 but doubt that is a rule)
SQL Server: char(n), DATETIME, datetime, float, INT, int, money, nvarchar(n), smallint, VARBINARY, VARCHAR(n)
MySQL: BLOB, CHAR(n), DATE, DATETIME, INT, VARCHAR(n)
MariaDB: bigint, BLOB, CHAR(n), DATETIME INT, int, varchar(n)
Tarantool: INT, INTEGER, SCALAR, STRING
... Summary: vendors do not follow style guides in this area.
Oracle SQL Developer has an option "Case change", and if you pick "UPPER" then VARCHAR2 is upper case, that is, it is a keyword like any other.
This redgate forum post shows that there are different options for keywords UPPERCASE and data types lowercase.
Choice: allow non-standard literal formats?
In standard SQL the format of a literal determines its type:
inside '' -- CHAR
inside X'' -- BINARY or VARBINARY
inside DATE '' -- DATE
exponential notation -- DOUBLE or REAL or FLOAT
[sign]digits[[period][digits]] INTEGER or SMALLINT or DECIMAL or NUMERIC
and so on.
But vendors vary, and due to implicit casting the format does not really indicate much.
What then should we do if we want to put ' within ''?
MySQL/MariaDB and SQL Server sometimes let us say "...'..." (they have to use something else for delimiting identifiers).
Some DBMSs let us escape by saying '...\'...'.
Some JDBC APIs let us escape by saying {escape 'escape character'}
Oracle lets us change what the character string delimiter is with q'{...'...}'
Perhaps all DBMSs let us say '...''...'.
Prescriptive guides:
Nobody says anything.
Vendor manuals:
Examples usually show '...''...' as it is the only standard unless the DBMS supports Unicode escaping.
Blogs:
Louis Davidson suggests functions quotename() and concat() for putting strings together.
Choice: use standard syntax, or use a continuation character, or use ||
This situation comes up with both character and binary strings:
the string is too long (presumably that means "Maximum line length" is exceeded).
The string has to be broken up and placed on multiple lines.
"use standard syntax" means: depend on the fact that in standard SQL 'A' /* whitespace */ 'B' is interpreted as 'AB'. So
SELECT 'video meliora proboque, ' 'deteriora sequor.' ...But, alas, some vendors do not allow it.
"use a continuation character" means depend on the client (or possibly the server) to see a line-continuation signal. So
SELECT 'video meliora proboque, '\ 'deteriora sequor.' ...But, alas, some vendors use different continuation characters.
"use ||" means depend on support of || for concatenation. So
SELECT 'video meliora proboque, ' || 'deteriora sequor.' ...This is the best bet for vendor support; even MySQL/MariaDB can be forced to accept || for concatenation sometimes.
Prescriptive guides:
I have no information from any prescriptive guide.
Vendor manuals:
Oracle: In SQL/Plus, the continuation character is a hyphen.
DB2: line continuation is possible with \ but this is in the client
SQL Server: line continuation is possible with \ and this says that is part of T-SQL not the client
Choice: Names should mean something?
We all know the setting should be 'yes' but if you are trying to make a point about syntax, then a placeholder name has all the meaning that you need, because the referent could be anything at all. So in a document like this I use names like TABLE_NAME or Column1 but expect that people would be more specific when representing something specific.
As far as I can tell the foobar placeholders, known in other languages, are not frequent in SQL contexts. Perhaps we have fewer reasons to swear.Prescriptive guides:
No need to quote. Everybody would agree.
Choice: Names UPPER CASE or lower case or snake_case or PascalCase or camelCase?
Terminology:
camelCase = (all words except the first start with a capital letter)
PascalCase = (all words start with a capital letter) sometimes called upper camel case
SNAKE_CASE = all words all capital letters, underscore separator)
snake_case = all words all lower case letters, underscore separator)
...
Inevitably, in the following Names sections, I am going to have to
give examples which use a particular case.
This does not mean I recommend a particular case, I leave recommendations to prescriptive guides.
One argument against lower case is that it does not reflect what the standard information_schema name will look like, since the standard rule is that the name will be folded to upper case before being stored. PostgreSQL ignores the SQL standard (see the blog post "Sometimes MySQL is more standards-compliant than PostgreSQL"), but some other DBMSs do not.
One argument against snake_case is the academic conference presentation by
David Binkley and Marcia Davis and Dawn Lawrie and Christopher Morrell,
"To camelcase or under-score"
which concluded:
"... it be-comes evident that the camel
case style leads to better allaround performance once a subject is trained on this style."
But their experimenting was not in an SQL situation
where names are distinguished by indentation or by having all non-identifiers in upper case.
And anyway there is another study by
Bonita Sharif and Jonathan I. Maletic,
"An Eye Tracking Study on camelCase and under_score Identifier Styles",
which concluded:
"While results indicate no difference in accuracy between the two styles,
subjects recognize identifiers in the underscore style more quickly."
(I have only read the abstract.)
Prescriptive guides:
Holywell says: "Use underscores where you would naturally include a space in the name" ... "Avoid ... CamelCase—it is difficult to scan quickly." (Apparently this is a reference to what I have called PascalCase.)
Salvisberg says: "write all names in lower case"
Benenson gives an example of snake_case: "SELECT COUNT(*) AS backers_count" and says "Variable names should be underscore separated:" (I consider "underscore separated" to be a synonym of "snake_case").
Sarkuni says: snake_case
Mullins says: SNAKE_CASE
Taranov says: PascalCase for everything except database, schema, and synonym.
Factor says: "Schema objects are, I believe, better capitalized." (example) "This_Is_Capitalized"
Vendor manual examples:
MariaDB, MySQL, Oracle, Tarantool: usually snake_case
SQL Server: usually PascalCase
DB2: usually SNAKE_CASE
SQL Standard: SNAKE_CASE
For example these are extracts from the manuals' CREATE TABLE pages.
CREATE TABLE employees_demo ... (Oracle)
CREATE TABLE EMPLOYEE_SALARY ... (DB2)
CREATE TABLE CREATE TABLE dbo.PurchaseOrderDetail ... (SQL Server)
CREATE TABLE client_firms ... (MySQL)
CREATE TABLE table_name ... (MariaDB)
CREATE TABLE modules ... (Tarantool)
Choice: Names can include $?
Choice: Names can include letters other than A-Z?
What characters should be legal for regular identifiers of most objects? In standard SQL, the answer is _ or digit or any Unicode character that is considered to be a letter, so Cyrillic / Japanese kana / Chinese / etc. are all okay But in practice the answers vary widely, as one can see by looking at the vendor documentation of DB2, Oracle, SQL Server, MariaDB, Tarantool (see "Identifiers"). The minimum common denominator is A-z, a-z, 0-9, _ and -- for some reason I have never understood -- $. The maximum is any Unicode letter, digit, _, $, and sometimes # or @.
I used to expect that (say) Russian developers would regard Latin-alphabet names as inferior to Cyrillic-alphabet names. But evidence for that expectation is totally lacking. Possibly one thing that holds people back is the fear that some client or driver will misplace the character set and show garbage? Improve your vocabulary: the word for such misplacement is mojibake.
Prescriptive guides:
Holywell says: "Only use letters, numbers and underscores in names." Taranov says: "use only Latin symbols [A-z] and numeric [0-9]."
Vendor manuals:
DB2's "Some ODBC Applications May Not Be Able to Handle Special Characters in Table/Column Names" warning is that there are problems with drivers with some special characters.
Choice: maximum length: 18 or 30 or 64 or 128?
There is a vendor-dependent maximum length. Usually it is 128 (DB2, Oracle, SQL Server), or 64 (MySQL/MariaDB), or more than 20000 (Tarantool). Shorter names were required in olden times and may still be required for a few objects.
But the desirable column-name length will be less than the maximum if column names are used as headers in reports. Here I am assuming that the name will inevitably be the label, although JDBC thinks otherwise.
Length considerations might induce people to use abbreviations, and that must be okay because the SQL standard uses abbreviations too.
Prescriptive guides:
Mullins says: recommended maximum length = 128
Holywell says: "Keep the length to a maximum of 30 bytes"
Taranov says: 128 for most things
Salvisberg says:
"Avoid using abbreviations unless the full name is excessively long.
Avoid long abbreviations. Abbreviations should be shorter than 5 characters.
Any abbreviations must be widely known and accepted.
Create a glossary with all accepted abbreviations."
Vendor examples:
Microsoft's "Column name limitations" warning is that there are problems with drivers with long names and special characters.
Bloggers:
Northwestern University "Data base object naming standards: Abbreviations" is a huge list prefaced by the words "In general, abbreviations should be used only when length restrictions prevent use of fully spelled-out words in object names."
Choice: Should identifiers be delimited?
If possible this means enclosing "..."s as in standard SQL, but may mean enclosing `...`s or enclosing [...]s.
Again, there might be some reason for explanatory names for reporting requirements,
but for programming requirements the only possible reasons would be:
() You are using more than one SQL dialect and you want to maintain the same names in both dialects
() You prefer case sensitive names and your SQL dialect follows the SQL standard requirement for delimited identifiers.
Re [...]: Microsoft's rules for SET QUOTED_IDENTIFIER are so complicated that nobody can be blamed for using []s instead.
Re `...`: Although people call them backticks when used for delimiting, the real (Unicode) name is U+0060 "grave accent". If you have a French keyboard where a grave accent is a "dead key", it might be troublesome.
Prescriptive guides:
Sarkuni says: "Avoid quotes."
Donnelly says: "Named objects should not be surrounded by backticks"
Salvisberg says: "avoid double quoted identifiers" ... "Never use quoted identifiers."
Holywell says: "Avoid ... Quoted identifiers -- if you must use them then stick to SQL92 double quotes for portability".
Bloggers:
Apex's "Rules of SQL formatting" say: Adding square brackets around all identifiers can be visually distracting. but they have an option named "Remove unnecessary brackets".
Cloudera's Impala Guide recommends: "consider adopting coding conventions (especially for any automated scripts or in packaged applications) to always quote all identifiers with backticks."
Choice: Should names be qualified?
Saying "SELECT employees.id FROM employees;" looks like pleonasm.
Saying "SELECT employees.id FROM employees, departments;" looks like pleonasm but maybe serves a "useful function".
Saying "SELECT id FROM schema_name.table_name;" looks like a necessity if there is no default schema.
I have rarely seen the four-level catalog_name.schema_name.table_name.column_name. At MySQL we played with it for a while for information_schema but in my opinion we should have reserved the top level for the server source, which is something that I think SQL Server has approached. People who get deep in qualifier mires might be tempted to add aliases.
Prescriptive guides:
Sarkuni says: "All field names in non-trivial SQL statements (i.e. those with more than one table) should be explicitly qualified and prefixing as a form of namespacing field names is a bad idea."
Mazur says: "Include the table when there is a join, but omit it otherwise" (i.e. qualify if there is a join).
Blogs:
Tim Hall passes on advice about avoiding name clashes in PL/SQL.
Aaron Bertrand says that leaving out a schema name can have side effects.
Steven Feuerstein says "Qualify every identifier in SQL statements."
Choice: Should any object name have any prefix or suffix?
Some say that an object's name should indicate its type, and some say otherwise. I will just note here what the otherwise-sayers say in prescriptive guides. Sarkuni says: "Object names should not include the object type in them." Mullins says: "it is a bad idea to embed specialized meaning into database object names" (he gives an example of a suffix that means unique, and asks what if it is also a foreign key). Salvisberg says: "Avoid adding redundant or meaningless prefixes and suffixes to identifiers. Example: CREATE TABLE emp_table." Holywell says: "Avoid ... Descriptive prefixes or Hungarian notation such as sp_ or tbl."
Improve your vocabulary again ... Factor says: "The habit most resistant to eradication is "Tibbling," the use of reverse Hungarian notation, a habit endemic among those who started out with Microsoft Access. A tibbler will prefix the name of a table with "tbl," thereby making it difficult to pronounce."
Keller says: "Avoid prefixes and suffixes for tables and views, such as tblTable. Hungarian notation (which was always intended to identify variable usage) slipped into common SQL Server naming conventions, but it is widely derided. Object identifiers should describe what is contained within, not the object itself."
But other prescriptive guides say: go for it.
So in later sections I will repeat the advice of people who advocate prefixes and suffixes, but those sections all have an implicit caveat: others disagree with all such paraphernalia.
If you decide "Should any object name have any prefix or suffix=no", then ignore any prefix/suffix Choices in following sections.
Choice: Should table names be plural?
Sometimes no. In English and many other languages pluralizing can be confusing or inconsistent. Some tables by definition have exactly one row so demanding plurals everywhere can be excessive. And of course sometimes noted authorities say use singular, so just go along to get along.
One authority that is sometimes brought in is Codd. He used singular for relations. But Fabian Pascal says "In other words, there is no relational (i.e., formal theoretical) reason for preferring one over the other -- Codd's reference to the "Supplier relation" should not be taken to mean imposition of the singular."
One authority that is sometimes brought in is ISO 11179. But I have been unable to find where that states that singular names are better, and it seems that other people have also been unable.
Prescriptive guides:
Hall says: All table names should be plural. If the table name contains several words, only the last one should be plural. Example: APPLICATION_FUNCTION_ROLES
Sarkuni says: "Tables, views, and other relations that hold data should have singular names, not plural. Rather than going into the relational algebra explanation of why this is correct I will give a few practical reasons. They're Consistent. It's possible to have a relation that holds a single row. Is it still plural? They're unambiguous. Using only singular names means you do not need to determine how to pluralize nouns. Ex: Does a "Person" object go into a "Persons" relation or a "People" one? How about an "Octopus" object? Octopuses? Octopi? Octopodes? Straightforward 4GL Translation. Singular names allow you to directly translate from 4GL objects to database relations. You may need to remove some underscores and switch to camel case but the name translation will always be straight forward. Ex: team_member unambigously becomes the class TeamMember in Java or the variable team_member in Python."
Keller says: "At first glance, it’s natural to think of a collection of objects in the plural. A group of several individuals or companies would be Customers. Therefore, a table (being a collection of objects) should be named in the plural. An individual row in that table would be a single customer. The ISO/IEC naming principles, while dated, recommend pluralized table names and singular column names. Most SQL Server system tables use plural names (sysnotifications, sysoperators), but this is inconsistent. Why sysproxylogin and not sysproxylogins?" ... "Because pluralization of words can vary in so many ways (customers, mice, moose, children, crises, syllabi, aircraft), non-native speakers have additional challenges. Sticking with singular object names avoids this problem entirely."
Salvisberg says: "Plural name of what is contained in the table (unless the table is designed to always hold one row only – then you should use a singular name)." and "A jar containing beans is labeled "beans"."
Mazur says: "Table names should be a plural snake case of the noun"
Holywell says: "Avoid ... Plurals—use the more natural collective term where possible instead. For example staff instead of employees or people instead of individuals. ... "Use a collective name or, less ideally, a plural form. For example (in order of preference) staff and employees."
Taranov says: not plural.
In some SQL standard examples, there are plurals when contents can be plural, singular when contents can only be singular
Bloggers:
Peter Gulutzan (also known as "I") said long ago in "SQL Naming Conventions": Example: SELECT * FROM beans; not SELECT * FROM bean;
Choice: temporary table name prefix = # or tmp or temp or nothing?
With some DBMSs, it is possible to create a table named A and a temporary table named A. References to A will only find the temporary table, and the result is occasional confusion. So it might be okay to recommend, or force, people to distinguish temporary table names.
Prescriptive guides:
Salvisberg says temporary tables can have an optional suffix _tmp.
Vendor manuals:
In SQL Server a prefix # or ## is compulsory. (The Unicode name for the # symbol is "number sign", not "hash mark" and, alas, not the lovely word "octothorpe").
Bloggers:
Stephen Faroult and Pascal l'Hermite in "Refactoring SQL Applications" suggest that it is common for temporary tables to have a prefix or suffix of TMP or TEMP.
Choice: view name prefix = v_ or vi_ or vw_ or nothing?
Choice: view name suffix = _v or _vi or _vw or nothing?
View names are table names and are not distinguished from base table names.
Views are tables so the Choices in the previous section are applicable. However, sometimes people want to distinguish them with a prefix or suffix. In particular, I have seen a suggestion that, if a view is of a single base table, its name may be the base table name plus the suffix _v.
Keller says: view suffix = _v. Taranov says: View prefix = VI_ Salvisberg says: optionally add _v
Bloggers:
Joe Celko says "VIEWs are often named incorrectly. A VIEW is a table, so it is named just like any other table. The name tells us what set of things it represents in the data model. The most common offender is the “Volkswagen” coder who prefixes or suffixes the VIEW name with “vw_” or “VIEW_” in violation of ISO-11179 rules. We do not mix data and meta data in a name. This is as silly as prefixing every noun in a novel with “n_” so the reader will know that the word is a noun in English grammar."
(That reminds me that once upon a time writers did distinguish nouns by capitalizing them. Germans still do.)
Sharad Maheshwari and Ruchin Jain, "DBMS – Complete Practical Approach", say: "All view names must begin with of the following prefixes: vw, VW, v, or V."
Mullins says: "avoid embedding a 'T', or any other character, into table names to indicate that the object is a table. Likewise, indicator characters should be avoided for any other table-like object (i.e. aliases, synonyms, and views)."
isbe.net "Naming conventions" says: prefix with vw.
Choice: Should column name include table name or abbreviated table name?
Column names are singular, although I suppose that columns with an array or multiset data type might not be (the only guide that suggests such an exception is Salvisberg).
The name "id", which is an abbreviation of identification, usually for a unique-key column with a numeric data type, is popular. But opinions vary whether a column should be named id or have a suffix _id. If you join two tables, and they both have a column named id, then you are forced to qualify. But if you avoid that by putting the table name in the column name, for example the employees table has employee_id and the departments table has department_id, then you are being redundant because you already know what tables the columns are in. An advantage of naming all primary keys id is: you know what it is without looking it up.
Names in other computer languages have hints about types, for example is_fat is probably a boolean. But opinions vary whether this should occur in SQL.
Prescriptive guides:
Hall says: do not include table name or alias.
Sarkuni says: name = id is okay ("This means that when you're writing SQL you don't have to remember the names of the fields to join on") and a column referencing it should have referenced-table-name and suffix _id.
But Keller says: if there is a column referencing it, then name = CustomerID is better than Customer.ID.
Holywell says: name = id is not okay ("avoid").
Mazur says: "Boolean fields should be prefixed with is_, has_, or does_. For example, is_customer, has_unsubscribed, etc."
Vendor manuals:
I checked what column names were in examples for the CREATE TABLE statement.
MariaDB: a, b, expires, x
MySQL: id, name, a, b, adate, s1, s2
Oracle: employee_id, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct,
manager_id, department_id, dn, id, col1, col2, title, author, department_id, department_name, etc.
DB2: DEPTNO, DEPTNAME, MGRNO, ADMRDEPT, PROJNO, PROJNAME, RESPEMP, PRSTAFF, PRSTDATE, EMPNO, SALARY, ID, NAME, LIVING_DIST,
SSN, VOICE, PHOTO, HIREDATE, COMM, ACTNO, EMPTiME, etc.
SQL Server: col1, x, EmployeeID, PurchaseOrderID, LineNumber, ProductID, UnitPrice, OrderQty,
ReceivedQty, RejectedQty, DueDate, rowguid, LineTotal, StockedQty, LName, FName, GUID, low, high, etc.
Tarantool: s1, s2
... Notice the frequency of _id or NO suffix, DB2's love of abbreviations, and
how both MySQL and Tarantool use s1 and s2 (you can blame me for that, s stands for Spalte which is the German word for column).
Notice that SQL Server does capitalize when there is only one word, but probably this is not a rule.
Blogs:
Lukas Eder points out that qualifying id in SQL statements is not good enough if you also have to distinguish in a Java client.
Power BI (apparently thinking about things like report headings) says: "For example, you should use “Sales Amount” instead of “SalesAmt” or “SalesAmount”. You can use space and special characters in column names of a view. The goal is to simplify the life to the user, and not to simplify the life to a programmer who has to type a column name in the keyboard."
Choice: Should aliases / range variables be abbreviations?
In a select list a column can be named with [AS] alias-name. Aliases can be used in ORDER BY, and in MySQL/MariaDB can be used in GROUP BY, and with (rarely supported) "lateral column aliasing" they can even be used later in the select list, but the main purpose is just so that the result name can be short and legal. Vendors might invent names for expressions but they can vary and can be odd.
Also in FROM etc. an [AS] name clause can appear. Often the name is called an alias but in standard SQL terms like "correlation name" and "range variable" are preferred. One main purpose is to avoid repeating long qualified names.
For columns the common idea is to be comprehensible, for tables the common idea is to be short.
Prescriptive guides:
Holywell, Mazur, and Taranov say: Always be explicit and say AS name. Pleonasm does not apply.
Donnelly says: Always be explicit and say AS name -- for columns. "Column aliases should always use the keyword AS"
Donnelly says: "Tiny names for table aliases can sometimes work as abbreviations."
Holywell says: maybe, alias name should be first letter of each word in a table name
Mazur says: "Avoid aliasing table names most of the time"
Benenson says: "Always rename all columns when selecting with table aliases"
Salvisberg says: "Always use table aliases when your SQL statement involves more than one source."
Vendor manuals:
Oracle: "AVG(salary) AS avgsal"
SQL Server: "AVG(UnitPrice) AS [Average Price]" ... an old version of the manual says: "The AS clause ... is the preferred syntax to use in SQL Server 2005." Perhaps one could interpret that as meaning the word AS should be explicit, but I prefer to believe that it means: stop using the deprecated alternative, an equal sign
MySQL: "CONCAT(last_name,', ',first_name) AS full_name", "id AS 'Customer identity'" (very non-standard) ... "it is good practice to be in the habit of using AS explicitly when specifying column aliases"
Blogs:
ApexSQL has an option named "Use first three letters for alias" and an example "HumanResources.EmployeeDepartmentHistory edh".
Aaron Bertrand says: one to three letters, but no random single letters please.
Choice: Use automatically generated names where possible?
Choice: Constraint (Primary Key) Suffix = _pk or _PK or nothing
Choice: Constraint (Primary Key) Prefix = pk or PK or pk_ or PK_ or nothing
Choice: Constraint (Unique Key) Suffix = uq or UQ or _uq or _UQ or _uk or _UK or nothing
Choice: Constraint (Unique Key) Prefix = uq or UQ or uq_ or UQ_ or uk_ or UK_ or nothing
Choice: Constraint (Foreign Key) Suffix = fk or FK or _fk or _FK or nothing
Choice: Constraint (Foreign Key) Prefix = fk or FK or fk_ or FK_ or nothing
Choice: Constraint (Check) Suffix = ck or CK or ck_ or CK_ or nothing
Choice: Constraint (Check) Prefix = ck or CK or _ck or _CK or nothing
Sometimes constraint names are decided by the vendor, for example when the word UNIQUE is used in a CREATE TABLE statement. This is fine if you intend to use the vendor's style, but usually it looks machine-generated. For example if you say CREATE TABLE m (... s1 INTEGER UNIQUE); for Tarantool it is named unique_unnamed_M_1 (type + name + table-name + integer), for DB2 it is a timestamp, for MySQL/MariaDB it is the column name. If you think that as a human you can do better, you will want to override the DBMS, always, by specifying with your own system. And there is no reason that the name of a constraint should differ wildly from the name of an index that it depends on.
Typically a constraint name will include a table name (sometimes abbreviated or aliased), a column name if applicable, and a suffix indicating the constraint type. Some suggesters, for example this bugs.mysql.com issue#66051 and cakephp's "Database Conventions", say that for foreign keys the table-name part should be singular, for example if the table name is orders then the table-name part of the foreign key name should be order.
Prescriptive guides:
Holywell says: depend on the vendor name (generally it is "sufficiently intelligible"), otherwise make a custom name
Keller says: prefix PK or CK or FK or UQ "sometimes"
Hall says: table + _PK or _FK or _CK. For foreign key: referencing-table + referenced-table + _FK. For check: something + _CHK.
Taranov says: PK_ or CK_ or FK_
Salvisberg says:
For primary key: table + _pk.
For unique key: table + _uk.
For check: table + column-or-role + _ck + optional number
For foreign key: referencing-table + referenced-table + _fk + optional number
Mullins asks: what if a constraint is both foreign-key and unique, what do the suffix-lovers do? I have seen no answers.
Vendor manuals:
Oracle: (for foreign key) fk_deptno, fk_empid_hiredate, (for check) check_divname, check_office, (for not null) nn_qty, emp_salary_nn_demo (for unique) promo_id_u, wh_unq, emp_email_uk_demo (for primary key) loc_id_pk, sales_pk.
DB2: (for check) phoneno_length, YEARSAL, (for unique) EMP_ACT_UNIQ, (for foreign key) EMP_ACT_PROJ
SQL Server: (for check) CK_emp_id (for default) DF_PurchaseOrderDetail_ModifiedDate (for primary key) PK_PurchaseOrderDetail_PurchaseOrderID_LineNumber, Guid_PK
MySQL: (for check) c1_nonzero, c2_positive (for foreign key) child_ibfk_1
MariaDB: (for check) a_greater
Bloggers:
Microsoft's "Schema Generation Wizard" has options for prefixes of names of primary-key columns and foreign-key columns: PK and FK.
Shane O’Neill has arguments for why all constraints should have names.
Choice: Include table name in index name?
Choice: Index name prefix = idx or ix or nothing?
Choice: Index name suffix = i or idx or ix or nothing?
Often an index will be associated with a constraint, and so the rules for "Names of constraints" will apply to indexes. Otherwise, the components of an index name are: maybe a prefix, maybe the table name, maybe one or more column names, maybe a suffix, maybe an ordinal.
For the typical index that is a result of something like
CREATE INDEX index-name ON table-name (column-name [, column-name ...])
the index is in the table's namespace, so there is no problem
having an index named I for one table and another index named I for another table.
Therefore it is redundant to include table-name in index-name, but
it happens anyway.
Prescriptive guides:
Sarkuni says: "Indexes should be explicitly named and include both the table name and the column name(s) indexed.
Including the column names make it much easier to read through SQL explain plans."
Keller says: prefix = IX.
Hall says: name = table + column(s) + _I
Salvisberg says: name = table + column(s)-or-purposes + _idx.
Vendor manual examples:
Oracle: "cust_eff_ix ON customers", "ord_customer_ix ON orders", "idx_personnel ON CLUSTER personnel", "area_index ON xwarehouses", "upper_ix ON employees (UPPER(last_name))", "income_ix ON employees(salary + (salary*commission_pct))", "cust_last_name_ix ON customers (cust_last_name)" etc.
DB2: "UNIQUE_NAM ON PROJECT(PROJNAME)", "JOB_BY_DPT ON EMPLOYEE (WORKDEPT, JOB)", "IDX1 ON TAB1 (col1)", "MYDOCSIDX ON MYDOCS(DOC)" etc.
SQL Server: "i1 ON t1 (col1)", "IX_VendorID ON ProductVendor (VendorID)" "AK_UnitMeasure_Name ON Production.UnitMeasure(Name)", "IX_INDEX_1 ON T1 (C2)" etc.
MySQL: "part_of_name ON customer (name(10))", "idx1 ON t1 ((col1 + col2))", "id_index ON lookup (id)"
MariaDB: "HomePhone ON Employees(Home_Phone)", "xi ON xx5 (x)"
... None of the manuals have a consistent style, but we do see some index names that have prefixes/suffixes, table names, column names, or function names.
Choice: Trigger name suffix = _trg or _TRG or _tr or or _TR or nothing?
Choice: Trigger name prefix = _tr or TR_ or nothing?
If a trigger involves an action on a table, everyone seems to agree that the trigger name should say something about the action and the table. Not everyone agrees which comes first, and not everyone agrees what "something" is. It could be the verb as a suffix (for example _insert) or it could be an abbreviation (for example I or ins) or it could merely be a "suggestion" of the verb (for example new_).
If for column names you decide "do not include the table name", why for trigger names should you decide "do include the table name"? Here is a possible excuse: columns are parts of tables, but triggers are independent objects, so this is a different matter. In PostgreSQL a trigger is "local" to a table (a departure from standard SQL) so I do not know whether the usual rules should apply there.
Prescriptive guides:
Hall says: "Trigger names should be made up of the table name, an acronym representing the triggering action and the suffix "_TRG"."
Example: APPLICATION_BIS_TRG where BIS stands for BEFORE INSERT STATEMENT-LEVEL"
Taranov says: TR_ prefix and _DML or _DDL suffix
Salvisberg says:
either object-name + b or io (for before-row or instead of) i or u or d (for insert or update or delete) without _trg suffix,
or object-name + suggestion-of-the-verb + _trg.
Vendor manuals:
I looked at examples in the CREATE TRIGGER pages and sub-pages.
Oracle: t, "order_info_insert INSTEAD OF INSERT ON order info", dept_emplist_tr , "maintain_employee_salaries FOR UPDATE OF salary ON employees
DB2: "NEW_HIRED AFTER INSERT ON EMPLOYEE", "RAISE_LIMIT AFTER UPDATE OF SALARY ON EMPLOYEE" "NEWPROD NO CASCADE BEFORE INSERT ON PRODUCT"
SQL Server: reminder1, "Purchasing.LowCredit ON Purchasing.PurchaseOrderHeader AFTER INSERT" "connection_limit_trigger ON ALL SERVER WITH EXECUTE AS 'login_test' FOR LOGON"
MySQL: "ins_sum BEFORE INSERT ON account", "ins_transaction BEFORE INSERT ON account", "upd_check BEFORE UPDATE ON account"
MariaDB: "increment_animal AFTER INSERT ON animals"
Tarantool: "stores_before_insert BEFORE INSERT ON stores"
Choice: Sequence name suffix = _seq or nothing? Choice: Sequence name prefix = sq_ or nothing?
If a sequence is associated with a single other object, such as a table or the primary-key column of a table, then there is agreement that the sequence name should say something about that object. There is no agreement, as I said earlier, to have a prefix or suffix to indicate object type.
Prescriptive guides:
Salvisberg says: if the sequence is for a table's primary-key generation: table-name-or-abbreviation + _seq
Taranov says: sq_ prefix
Vendor manuals:
Oracle: customers_seq
SQL Server: Test.CountBy1, Test.CountByNeg1, ID_Seq, TestSequence, SmallSeq, DecSeq (apparently they're naming based on increment-value and on data type, as well as object-name + suffix)
DB2: ORDER_SEQ
MariaDB: s, s2, s3
Here is an Oracle document saying _SEQ. Here is Oracle's "2-Day Developer's Guide" saying "Tip: When you plan to use a sequence to populate the primary key of a table, give the sequence a name that reflects this purpose. (This topic uses the naming convention table_name_SEQ.)"
Bloggers:
Richard Foote and Mandeep K Sandhu and Vinish Kapoor uses _seq.
The progress.com manual says: Do not define ORACLE sequences with names ending in _SEQ unless the dataserver manual instructs you to do so. The DataServer uses ORACLE sequences whose names end in _SEQ for internal purposes.
The PL/SQL Style Guide says all sequence names should end in _seq
SunCom says: add _SEQ
Choice: Use prefix or suffix for function or procedure name?
Function names show what the returned value is, and optionally how it was obtained.
Some people
wonder whether a "get_" prefix is necessary.
Example: first_initial_of_name
Example: name_in_upper_case_according_to_german_collation
Built-in function names are upper case. They are not keywords and not reserved (at least not always), and they are like ordinary functions, so if we put them in upper case it is not because we are following a general rule that only keywords are upper case. However, it is popular to put them in upper case, as some vendors' manuals show. Example: SELECT ABS(1); not SELECT abs(1);
Prescriptive guides:
Salvisberg says: (re function) Name is built from a verb followed by a noun in general. Nevertheless, it is not sensible to call a function get_... as a function always gets something. The name of the function should answer the question “What is the outcome of the function?” Optionally prefixed by a project abbreviation.
Salvisberg says: (re procedure) Name is built from a verb followed by a noun. The name of the procedure should answer the question “What is done?” e.g. calculate_salary
Bloggers:
ApexSQL's Rules of SQL formatting – SQL naming conventions and capitalization rules says: "Also, the prefix in the name of the stored procedure should not be sp_".
Savepoint names may contain a hint of what went before, for example after inserting the name could be: inserting. However, there is no common practice, so there is no need for a "Choice: ..." here.
Vendor manual examples:
Oracle: do_insert my_savepoint
DB2: SAVEPOINT1, SAVEPOINT2, SAVEPOINT3
SQL Server: ProcedureSave
Tarantool: x
Choices: Collation names: BCP style or Microsoft style or roll-your-own?
This would apply only for vendors which allow users to create collations, which are rare.
BCP style is what I call the use of the Unicode Locale Data Markup Language (LDML), for example "da-u-ks-level1" where da = language tag for Danish, u stands for BCP 47's locale extension, ks is the key for strength from the Collation Settings table and level1 means level1. No character set or code page.
Microsoft style is what I call the suffixes and Microsoft's fixed rules: The suffix _ai ("accent insensitive") and the suffix _ci ("case insensitive"). Although _ci_ai might sometimes be very very roughly equivalent to a standard level1 collation, level1 really should also mean things like "ignore Japanese katakana/hiragana differences" and "ignore differences between S and Sharp S" and much else that has absolutely nothing to do with accents and cases.
The roll-your-own choice would be something more descriptive but less "standard" than the other choices, for example GermanPhoneBook.
Prescriptive style guides and vendor manual examples: none.
Choice: input-parameter prefix: i_ or in_ or nothing
Choice: output-parameter prefix: out_ or nothing
Choice: local-variable prefix: l_ or v_ or nothing
Choice: parameter names: UPPER CASE or snake_case or PascalCase or camelCase
There is an incentive to use a prefix. Consider a MySQL procedure:
CREATE TABLE employees (thing INTEGER); INSERT INTO employees VALUES (0); CREATE PROCEDURE select_thing() BEGIN DECLARE thing INTEGER DEFAULT 1; SELECT thing FROM employees; END;Will it select 0 or 1? Yes there is a rule -- MySQL bug 5967 where Konstantin Osipov improves our vocabulary again by mentioning "name shadowing" -- but it is easy to forget it. So if there was a convention that always distinguished column names from variable names, this particular confusion could be avoided.
Standard SQL allows for qualifying variable names with the block label, but not all vendors allow that.
Prescriptive guides:
Hall says: "PL/SQL variables are prefixed with a single letter, if possible, to indicate their type or usage." e.g. l_ + local variable name, i_ for input parameter name
Salvisberg says: in_ prefix for input parameter, out_ prefix for output parameter, l_ prefix for local variable
Taranov says: "Parameters name should be in camelCase"
Vendor manual examples:
I only looked at DECLARE examples:
Oracle: no suffixes or prefixes, snake_case
DB2: Prefix v_ then camelCase, for example v_rowsChanged
SQL Server: Variable names always start with @, after that the case examples are inconsistent.
Bloggers:
Malcolm Coxall, "Oracle Quick Guides", says "Parameters names must begin with p_. Variable names must begin with v_ ... IN parameter can be named _in" OUT parameter can be named _out ... INOUT parameter can be named _inout".
Choice: Use host language's multiple quoting?
Choice: Put SQL statement text in variables?
This applies for EXECUTE IMMEDIATE but also for non-SQL programs that call SQL. The problem is always that execute("sql-statement") will not look nice if sql-statement contains "s, forcing escapes. Escapes -- usually \s -- make a statement less easy to read.
Languages such as Lua support
multiple quoting,
so the Tarantool manual suggests things like
execute([[SELECT 'string-literal' FROM "delimited-table-name";]])
instead of
execute("SELECT 'string-literal' FROM \"delimited-table-name\";")
But maybe you should not pass string literals, you should instead assign the literals to variables and pass variables. It is common in other languages to use a variable or a macro for any constant that might be re-used, and the same could apply for SQL.
Prescriptive guides:
Salvisberg says: "Always use a character variable to execute dynamic SQL."
Examples from vendor manuals:
Oracle shows an EXECUTE IMMEDIATE statement with a character variable.
DB2 shows an embedded-SQL example with a character variable.
Blogs:
"Pro Oracle SQL" says that using bind variables may decrease parsing overhead.
Choice: format?
Formatting, also called layout, involves adding or removing white space (spaces or tabs or newlines). Effects include indenting and aligning.
Choosing format = no means: leave the text as is.
Choosing format = yes means: remove white space that is already in the text, except in comments or literals or delimited identifiers. Then add white space according to Choices described in following Format sections.
Indenting means: adding fixed amounts ("units") of whitespace at the start of a line. For example one can say that the indent amount is "4 spaces". Then in this statement the second line is indented:
TEXT AT LEFT MARGIN, ALSO CALLED THE ZERO POINT TEXT INDENTED BY ONE UNITThe implication is that the second line is "within", "subsidiary to", "a level below", the first line.
If there is a third line which is subsidiary to the second line, then that is a multi-level indent.
TEXT AT LEFT MARGIN, ALSO CALLED THE ZERO POINT TEXT INDENTED BY ONE UNIT (AT LEVEL ONE) TEXT INDENTED BY TWO UNITS (AT LEVEL TWO) TEXT INDENTED BY ONE UNIT (AT LEVEL ONE AGAIN)I say the first line is at "indent zero", the second line is "indent +1", and the third line is also at "indent +1" (because "+" will mean "relative to the previous line"). It is not necessary to show "indent -1" because text end is obvious, it is before the next text starts.
Aligning, also called lining up, means: placing non-white-space text underneath non-white-space text on an earlier line. Left-align is the same as "indent at same level" (i.e. "indent +0 levels"), it means the first characters align, for example
TEXT AT LEFT MARGIN A BIT OF TEXTRight-align-of-first-word means the last characters of the first word align, for example
TEXT AT LEFT MARGIN A BIT OF TEXTRight-align-of-last-word means the last characters of the last word align, for example
TEXT AT LEFT MARGIN A BIT OF TEXTRight-aligning may involve shifting earlier shorter strings to match the longest string, for example
TEXT AT LEFT MARGIN A BIT OF TEXT A MUCH LONGER BIT OF TEXT
Phrase means: an uninterrupted series of keywords. Phrases may be treated as units, and people avoid putting newlines in phrases. Examples: CREATE OR REPLACE TRIGGER, DROP VIEW IF EXISTS, ROLLBACK TO SAVEPOINT, LEFT OUTER JOIN. Also I call >= ALL a phrase, though I acknowledge it does not fit my definition exactly.
Clause start means: a word or phrase that begins a clause. When discussing indenting or aligning, a statement start may be treated as a clause start.
Blogs:
Jeff Smith has examples of formatting with slightly different choices of words.
Format choices are often expressible as:
Change <word-or-phrase> to [<whitespace> +] <word-or-phrase> [+ <whitespace>].
For example, instead of expressing a choice as
Choice: Add spaces around arithmetic operators?
I say
Choice: Replace arithmetic operator with space + arithmetic operator + space?
For one Choice, that does not look simpler.
But for 30 Choices, it is much simpler because all Choices have the same "BNF".
Whitespace is zero or more
"space" U+0020
"tab" U+0009 rarely needed if indent units are expressed as spaces
"newline" go to start of next line, i.e. this is carriage return + line feed. also called line break.
"line feed" go to same position on next line, rarely needed
"+1 indent" adding one level of indentation
"-1 indent" removing one level of indentation, rarely needed
"+0 indent" to the same level, but this would always be assumed after newline
"0 indent" to level 0, which is the left margin
There are two default assumptions which are not Choices because they are universal defaults:
"Where K is keyword or literal, Change K + K to K + space + K"
"Change whitespace + newline to newline".
Choices are not cumulative. Once a Choice has been applied for a token, go to the next token.
The initial general assumption is: All text is changed so that all tokens are separated by a single space. As we will see in later sections, that assumption can be overridden, especially when there are comments and parentheses.
So most formatting choices can be phrased as:
Choice: [For a particular situation]
[before] add N newlines, indent N levels or left-align or right-align
[after] add N newlines
Example = "Choice: change FROM to 1 newline + indent+1 + FROM + [nothing]
"Choice: change WHERE to 1 newline + indent+0 + WHERE + [nothing]
"Choice: change AND to 1 newline + right-align + AND + [nothing]
SELECT skill FROM employees WHERE 0 = 0 AND 1 = 1;
Choice: change arithmetic operator to space + arithmetic operator + space?
Choice: change comparison operator to space + comparison operator + space?
Choice: && and || meaning OR are equivalent to AND and OR?
Choice: change . to space + . + space, or to .?
Choice: change || meaning concatenate to || + newline?
Choice: change semicolon to space + semicolon?
Choice: change comma to comma + space?
In this context a symbol is a token which is made up of non-alphabetic characters, such as = or <>. SQL does not require that symbols must be separated by white space, for example ('x'||'y'='z'AND-1="5") and ( 'x' || 'y' = 'z' AND -1 = "5" ) are the same. Therefore there are Choices for symbols but not for other tokens.
A binary operator is a symbol that separates two operands. The operation may be arithmetic, concatenation, bitwise, comparison, or assignment. Generally it does not matter -- all binary operators except . have the same rule, and the unanimous opinion of prescriptive guides is that that rule should be: change binary operator to space + binary operator + space = yes. But there are three possible exceptions: for && and || meaning OR, for ., and for || meaning concatenate.
By the way the SQL standard term is "dyadic operator" but according to Google Ngrams the term "binary operator" is much more common.
"&& and || are equivalent to AND and OR: yes" would only apply for MYSQL/MariaDB. See "Format AND / OR". There is no guidance for this setting.
"change . to space + . + space" means that qualified identifiers look like "table_name.column_name" instead of "table_name . column_name". It is clear from vendor examples that "table_name.column_name" is the preference, but I have seen exceptions.
"change || meaning concatenate to || + newline" is a Choice because there is a formatter that has special options for ||. I suspect that this choice exists for the sake of long literals. See section Long literals.
Another, more general, exception is: operators can be aligned if they are in lists. See section Format lists".
There is no Choice for other (monadic) operators such as unary minus or ~ for negation. They are always: space + operator, with no space before the next token.
There is no choice for : when it is used with a label in SQL/PSM. It is always : + space, with no space before the previous token.
"change semicolon to space + semicolon" is an uncommon choice but I have seen " ;" in the Microsoft manual and I have read that in eclipse it is considered a bug if the formatter always removes space before semicolon.
"change comma to comma space" is applicable only for commas that are not in a list (see Format lists). So the Choice affects function arguments like substring(1,2) or declarations like DECIMAL(5,3). Style guides for C, such as this one at umd.edu, say "Leave one space after a comma", but are probably not applicable.
Prescriptive guides:
Taranov gives an example where + instead of || is used for concatenation. Holywell says: "Although not exhaustive always include spaces: before and after equals (=) after commas (,) surrounding apostrophes (') where not within parentheses or with a trailing comma or semicolon."
Vendor Manuals:
In the MariaDB manual there is sometimes no spacing around = or before *.
Some Oracle examples are here.
Oracle SQL Developer has three options: "Spaces around operators (= < > + - * /)", "Spaces around commas", "Spaces around brackets [i.e. parentheses]"
Bloggers:
ApexSQL: there is an option for "add space inside parentheses"
Toad has a single option for "Plus-Minus-Mul-Div-Concat".
Choice: change table-name + ( to table-name + space + (?
Choice: change built-in-routine-name + ( to built-in-routine-name + space + (?
Choice: change user-defined-routine-name + ( to user-defined-routine-name + space + (?
Choice: change data-type + ( to data-type + space + (?
Choice: change ( to ( + space?
Choice: change ) to space + )?
Choice: align ) with statement start?
"change table-name + ( to table-name + space + (?" affects things like table-name(column-name) as in CREATE TABLE table-name(column-name), INSERT INTO table-name(column-name), CREATE INDEX index-name ON table-name(column-name). The most common setting is: yes.
"change built-in-routine-name + ( to built-in-routine-name + space + (?" is the decision whether to say UPPER(...) or UPPER (...). "change user-defined-routine-name + ( to user-defined-routine-name + space + (?" is the decision whether to say user_function(...) or user_function (...) Often the setting is No for both. But sometimes there are recommendations to say Yes for one and No for the other, so that it will be clear from the syntax what kind of routine you are calling. I am guessing that this happened because of C guidelines, for example this one at cmu.edu, that say "Do not put parens next to keywords. Put a space between. Do put parens next to function names. Keywords are not functions. By putting parens next to keywords keywords and function names are made to look alike." The Java Code Conventions are similar. But if that is the reasoning, it is an example of how C/Java rules get applied to SQL unnecessarily.
"change data-type + ( to data-type + space + (" is the decision to say VARCHAR (10) instead of VARCHAR(10). Really, nobody does. But I state it as a choice because it seems to be the only case where reserved-word + ( does not become reserved-word + space + (. Think of "IF (", "IN (", "WHILE (", ">= ALL (", "SELECT (", "VALUES (".
"change ( to ( + space" and "change ) to space + )"
is rare, but I have seen a recommendation for it.
Notice that this would change (((...)))
to ( ( ( ... ) ) )
.
For operator (expression) see Format symbols.
For (expression list) or (column list) see Format lists.
Prescriptive guides:
Mazur says: "Avoid spaces inside of parenthesis".
Vendor manuals:
In all vendor manuals, I was able to find parentheses in conditional expressions which did not contain a space after ( or a space before ).
The SQL standard document uses ( ( (
consistently.
Bloggers:
Michael Kruckenberg and Jay Pipes, in "pro SQL", say: "you may want to have your style guide require that SQL statements always include a space after the function name when using stored functions." This tip might depend on the proper setting of sql_mode=ignore_spaces in MySQL or MariaDB.
"Choice: Change ( to ( + newline + indent+1" in combination with
"Choice: align ) with statement start = yes" is something I have
only seen for
CREATE TABLE ... ( (Holywell)
ALTER TABLE ...( (Taranov)
and WITH ... (. (Taranov, Mazur, Benenson).
That is, the effect is
VERB name-or-expression ( )
Lukas Eder has some recommendations about when parentheses are good, and when they are not.
Choice: change simple comment to simple comment + 1 newline?
This is not really a choice, because nothing else makes sense. I cannot think of another time in SQL where <newline> cannot be eliminated or replaced by some other white space. Therefore it might be handy to put -- at the end of a line, to prevent a formatter program from removing the newline.
Choice: Put bracketed comment with following keyword.
Choice: Put bracketed comment after comma in a list.
Choice: Put bracketed comment on separate line, left-align?
Given
SELECT skill /* per test */ FROM employees;
only a human could know whether the comment is about the
column or about the FROM clause. Therefore, the human
who writes the statement should signal.
One way is to stick to the advice in section Comments
and always put bracketed comments before their subjects.
Another way is to put extra whitespace around the
comment, and a formatter program should take that into account.
(That is especially applicable for a comment within a phrase.)
If neither of those solutions can be trusted, the comment
will have to go on a line of its own.
Prescriptive guides:
Holywell has an example of a bracketed comment preceding the statement, on its own line.
Vendor examples:
Oracle: Bracketed comments may precede or may follow.
DB2: Bracketed comments are at end of line (same as simple comments) but can include newlines (unlike simple comments).
SQL Server: Bracketed multi-line comment precedes the whole statement.
MySQL: Hints come after the word SELECT and clearly belong with it.
MariaDB: Comments come at end of statement but before semicolon.
Tarantool: Comments come on the same line either before after the semicolon.
Blogs:
Toad has folding for: "multi-line comments, multiple single-line comments on consecutive lines"
Choice: Left margin: N?
Choice: Right margin: N?
Choice: Threshold = N?
Choice: If SQL statement length < Maximum line length: skip all other format choices?
Ordinarily Left margin = 1 because, as in editors, the minimum column number on a line is 1, not 0. However, for indenting a left edge is called a "zero point" as in editors, so "indent 0" means the Left margin.
Therefore Right margin - Left margin = Maximum line length. "Maximum characters per line" would be a better term but "Line length" is a term that infests style guides so I stick with it.
There is no consensus what Maximum line length should be, and computer style guides for other languages vary greatly. Suggestions include: 72, 80, 120. It might be useful to ask how wide is a screen, or how wide is a printed page, but those too are things which vary greatly, so it is prudent to assume that different people will choose different values.
"Threshold" -- not a common term -- is a position before the Right margin. The correct effects of passing the Threshold are not all clear to me, but I speculate that they include squeezing the contents because space is getting dear. For example, if Right margin = 80 and Threshold = 60 and length-of-text = 70, then one might ignore any changes which require further indenting, and ignore changes which require adding unnecessary spaces.
"If SQL statement length < Maximum line length: skip all other format choices = yes"
means you do not have to indent anything if the entire statement can fit on one line.
In other words, if you decide = yes, then this short line remains as is:
SELECT skill FROM employees;
But if you decide = no,
or,
if SQL statement length >= Maximum length,
then there is a need for more than one line.
The newlines should, if at all possible,
be on some lexical boundary.
Usually that means "newline for new clause" and I will devote several sections
to clauses, as well as to "Format lists".
Prescriptive guides:
Donnelly says: "Newlines should be used for any query that is at all complex or longer than 72 characters."
Mazur says: "The only time you should place all of your SQL on a single line is when you're selecting one thing and there's no additional complexity in the query:"
Vendor manuals:
Oracle SQL Developer once had options for "schema type = small SQL" and "threshold for small SQL = N", and ( as far as I can make out) you could use this for the equivalent of "Choice: no indenting if statement length less than N". This is no longer true. Nevertheless I quote an earlier description: "If you set the 'Schema Type' to small SQL - you have to mind the 'Threshold for small SQL' as that allows you to leave code lines that don't meet that threshold untouched, which is nice if you want to avoid unnecessary line breaks for very small lines of code."
Bloggers:
Factor says: "SQL code doesn't have to be broken into short lines like a Haiku poem ... to specify that there must always be a line-break between each phrase (before the FROM, ON, and WHERE clauses, for example) can introduce an unnecessary amount of white space into code."
Fontology says: "To determine line length for optimum readability, a good guideline is between 9 and 12 words for unjustified text." That establishes that "line length" can be the right phrase. But SQL Developer has an option "Max char line width".
This exchange on toadworld.com shows that Toad formatter by default goes as far as line length even if there are clauses, but (in Toad parlance) "folds" if line length is small. Also this shows how Toad "folds" when you change right margin. It is pretty clear that it initially will not fold, but will fold on clause boundaries as you reduce right margin. This is not quite the same as Choice: No new line and indenting if statement length less than maximum line length? it is more like Choice: No new line and indenting if CLAUSE length less than maximum line length? ... I think the meaning is: fold on clause boundary if and only if line too long.
In another toadworld.com exchange, a Quest employee says "I think it’s more intuitive that folding occurs at logical positions like open and close parens, entire statements, entire blocks, etc. Folding on individual clauses, expressions, and other pieces makes for a big headache and opens the door for all kinds of error."
Choice: The indenting unit is a tab, or a space?
Choice: For fixed indenting, the number of units is: N
Choice: For fixed indenting, after the first level, the number of units is: N
To be exact, a tab is Unicode character U+0009 CHARACTER TABULATION and a space is U+0020 SPACE. Usually the indenting unit is a space, and with a fixed-width font there is no problem with that. So choosing "number of units = N" means "fixed indenting is 2 spaces" for level 1, 2 * 2 spaces for level 2, and so on.
Suggestions for a fixed indenting amount are: 2, 3, or 4 spaces.
Ordinarily, the indent amount is the same for every level. Rarely, the indent amount is a larger amount (for example 4) for the first level, and a smaller amount (for example 2) for later levels.
Prescriptive guides:
Benenson says: 2 spaces. Salvisberg says: 3 spaces. Mazur says: 4 spaces. Holywell says: Indent column definitions by four (4) spaces within the CREATE definition.
Vendors:
Oracle SQL Developer has options for "indent N spaces" and "indent amount = space|tab", which to me indicates that they realize that some people prefer tabs.
The SQL standard uses 4 for the first level, 2 for levels after the first level.
Bloggers:
Factor says: usually 2 or 3 spaces.
The Toad formatter default is 3 spaces.
Andy Mallon goes into detail why spaces are preferable.
Bloggers:
ApexSQL, in SQL query basics – How to improve readability by formatting commas, spacing and alignment, says "you can also choose to add a fixed number of spaces after each column" (in a select list) option for "each clause and each clause set argument begins on separate lines"
Choice: Decide what is a clause according to common sources?
Choice: Decide what is a clause according to the statement's BNF?
When we're talking about indenting or aligning, we're talking about lining up with what a clause or expression is subordinate to.
Before saying "what to do with clauses" I must say "what is a clause".
There is no official definition that is appropriate.
We can point at a statement like
SELECT something FROM something WHERE something ORDER BY something;
and note that everybody agrees that SELECT / FROM / WHERE / ORDER BY
are "clause starts" (statement starts are a special form of clause starts).
So the general form for them is
something-that-is-not-clause-start keyword-phrase-as-clause-start [content]
But not every keyword phrase is a clause start. Define more narrowly.
My definition of "common SQL statements" is: statements that are defined in the SQL standard 9075-2 Foundation document, and supported in Oracle plus DB2 plus SQL Server.
My definition of "common clauses" is: clauses that cause indenting
in examples from the vendor manuals. I do not mean "always cause
indenting", in fact the examples are a minority of all cases.
The clauses that caused level-1 indenting in at least two examples
in two different vendor manuals are:
In ALTER TABLE: ADD, ALTER, DISABLE, DROP, ENABLE, REBUILD, WITH
In CREATE TABLE: ON, WITH
In CREATE TRIGGER: AFTER, BEFORE, FOR, REFERENCING
In CREATE VIEW: AS, SELECT
In DELETE: WHERE
In GRANT: TO
In INSERT: SELECT, VALUES
In SELECT: SELECT, FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT
In UPDATE: SET, WHERE
... This is a very conservative list.
... SELECT would cause indenting anyway because subqueries are clause boundaries.
If prescriptive guides are not explicit and vendor examples are lacking, there is still the official BNF Backus-Naur form or railroad diagram. Nobody explicitly says the BNF is an authority, but if you decide that it is, it might help you decide when deciding "when does a clause end" or "is this clause part of that clause".
The arbitrary and simplistic rules for using a BNF are:
1. A keyword plus a series of non-keywords is a clause.
(So a clause-start is the first keyword or phrase, and ends
when there is another keyword after the series of non-keywords),
or ends when the statement ends.)
2. The first clause (the statement start) may have sub-clauses.
3. Ignore BNF [options] words that are not in the statement that you are writing.
How well do they work?
I will take BNFs for some statements from the SQL standard, and apply them to minimal statements.
Assume that the clause choice is: change clause_start to indent-+1 + clause_start.
Assume that the lists choice is (we will define later in the Format lists section).
Assume that the threshold is 0.
BNF = CREATE ROLE <role name> [ WITH ADMIN <grantor> ]
The clauses are CREATE ROLE ... and WITH ... So:
CREATE ROLE r WITH ADMIN g;
BNF = CREATE [ RECURSIVE ] VIEW <table name> <view specification> AS <query expression> [ WITH [ <levels clause> ] CHECK OPTION ]
My example will not use the optional <levels clause> so the clauses are
CREATE VIEW ... and AS ... and WITH CHECK OPTION.
It is also decided (see "Format clauses") that SELECT is a clause too
but it is not in the main BNF and is therefore a sub-clause. So:
CREATE VIEW AS SELECT * FROM t WITH CHECK OPTION;
BNF = DECLARE <cursor name> <cursor properties> FOR <cursor specification>
The clauses are DECLARE and FOR. So:
DECLARE c CURSOR FOR SELECT * FROM t;
BNF = FETCH <fetch orientation> FROM <cursor name> INTO <fetch target list>
The clause starts are FETCH and FROM and INTO. So:
FETCH c FROM c INTO c;
BNF = GRANT <privileges> TO <grantee> [ { <comma><grantee> }... ] [ WITH HIERARCHY OPTION ] [ WITH GRANT OPTION ] [ GRANTED BY <grantor>]
The clause starts are GRANT, TO, WITH, GRANTED. So:
GRANT UPDATE (c) ON t TO c WITH GRANT OPTION;
BNF = PREPARE <SQL statement name> [ <attributes specification> ] FROM <SQL statement variable>
The clause starts are PREPARE, FROM. So:
PREPARE s FROM s;
BNF = SET CONSTRAINTS <constraint name list> { DEFERRED | IMMEDIATE }
The clause starts are SET, DEFERRED, IMMEDIATE. So:
SET CONSTRAINTS c DEFERRED;
BNF = SET SESSION CHARACTERISTICS AS <session characteristics list>
There are no clauses below SET. However, <session characteristics list>
is subject to the "Format lists" choice like any other list
but I said we will assume N = 5. So:
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BNF = UPDATE <target table> [ FOR PORTION OF <application time period name> FROM <point in time 1> TO <point in time 2> ] [ [ AS ] <correlation name> ] SET <set clause list> [ WHERE <search condition> ]
I will not use FOR or AS or or FROM or TO.
Therefore the clauses are UPDATE... SET ... WHERE ... So:
UPDATE t SET c = v WHERE TRUE;
In all cases the results are acceptable.
Prescriptive guides:
No prescriptive guide explicitly suggests making a clause list by either of the methods in this section.
Examples from vendor manuals:
For "Decide what is a clause according to common sources?" I looked at what more than one vendor says. Of course, if you use only one vendor, your list could be larger.
I looked for examples that clearly do not work with any combination of Choices along with: Use the statement's BNF to decide what is a clause = yes. Despite looking at a large sample, I found only one: DB2:
GRANT SELECT, UPDATE ON TABLE DSN8A10.DEPT TO PUBLIC;In this case IBM seems to have decided that TO ... is a clause start but ON ... is not.
Bloggers:
This stackexchange thread had some people attempting to define "clause":
Tako Lee's list of options for sqlparser is large because Tako Lee chose to allow different settings for different clauses. I chose to list the settings only once unless I saw that prescriptive guides or vendors were saying things like "indent +2 if it is JOIN, indent +4 if it is AND, etc.".
Choice: change clause-start to newline + indent+0 + clause-start
Or change clause-start to newline + indent+0 + clause-start
Or change clause-start to clause-start + newline + indent+1
Or change clause-start to newline + indent+1 + clause-start
And
Choice: change SELECT to SELECT + newline + indent+1
Or
Choice: change clause-start to newline + indent+0 + clause-start + newline + indent-+1
Usually the prescriptive guides use the term "left align" where I use "indent+0".
Let us see the effect of each Choice setting on this SELECT statement:
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
Assume indent amount is 4 spaces.
"change clause-start to newline + indent+0 + clause-start"
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
"change clause-start to newline + indent+0 + clause-start + newline + indent+1"
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
"change clause-start to newline + indent+1 + clause-start"
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;This indenting helps emphasize that FROM and WHERE and GROUP BY are "subordinate" to SELECT.
"change SELECT to SELECT + newline + indent+1"
Benenson suggests "SELECT goes on its own line" and "Align all columns to the first column on their own line".
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
change clause-start to newline + right-align word + clause-start
Or change clause-start to newline + right-align phrase + clause-start
"change clause-start to newline + right-align word + clause-start"
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;
"change clause-start to newline + right-align phrase + clause-start"
SELECT DISTINCT id FROM employees WHERE id > 5 GROUP BY id;Notice in this example that there is a "river of white". All the clause-starts have ended up on the left side of the river; all the clause-ends (id and employees and id > 5 and id) have ended up on the right side. "Right align word" usually produces rivers but "right align phrase" does it more.
Prescriptive guides:
Donnelly says: new clauses makes new start of line. ON is not a clause-starter. rivers: yes, right-aligned indenting. Donnelly describes rivers of white thus: "The keywords that begin a clause should be right-aligned. The idea is to make a single character column between the keywords and their objects." Donnelly even shows an example where the statement-start (SELECT) is not at Left margin because SELECT must right-align by phrase with GROUP BY:
SELECT key_column, COUNT(1) FROM tablename GROUP BY key_column;
Holywell says: rivers: usually, right-aligned first-word indenting. "Rivers are bad in typography, but helpful here."
In Taranov examples, it looks like ON and AND are right-aligned, but FROM/WHERE/JOIN/GROUP BY/ORDER BY are indent+0. This was not explained so I did not suggest a Choice for it.
Salvisberg says: rivers: yes, right-aligned indenting. i.e. align on first word, not align on phrase.
I am going to be repetitious now. Are subordinate clause-starts left-aligned or right-aligned? Well, this is what left-align looks like:
UPDATE IGNORE SET WHERE ORThis is what right-align looks like:
UPDATE IGNORE SET WHERE ORThus, the last character of each word is in the same position. This is what right-align with the last word looks like:
UPDATE IGNORE SET WHERE ORThus, when the clause-start is a phrase rather than a single word, the alignment is with the last word of the longest clause-start (IGNORE in this case) rather than with the first word (UPDATE in this case). If right-aligning is used, then number-of-spaces in indenting units will not matter.
* Oracle SQL Developer has an option: "Right-Align Master Keywords" (but the number of "master" keywords is limited (as in SELECT ... INTO ... WHERE)), and an option "Indent Main Keyword 2x" (they mean the lines after the verb), It is relatively new.
James Thigpen has an example like Donnelly's, where SELECT is moved over to right-align with GROUP BY.
Choice: Format clauses by aligning contents?
If we are aligning contents, then all the clause starts are on the left of the statement or subquery, and all the contents (the clause ends) are on the right. This can also be accomplished with right-aligning phrases, but when we align contents we are putting the spaces after the keyword phrase, not before the phrase.
Extract from an example from Taranov: (Notice that the commas are not part of the deal -- see also the dbvis support question.)
SELECT t1.Value1 AS Val1 , t1.Value2 AS Val2 , t2.Value3 AS Val3 INTO #Table3 FROM CTE_MyCTE AS t1 ORDER BY t2.Value2;
Extract from the TPC-H benchmark requirements:
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' -interval '[DELTA]' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus
Prescriptive guides:
Mazur says: "Some IDEs have the ability to automatically format SQL so that the spaces after the SQL keywords are vertically aligned. This is cumbersome to do by hand (and in my opinion harder to read anyway) so I recommend just left aligning all of the keywords:"
Choice: Lists maximum number of items per line = N?
Choice: Lists start with: space, or newline + usual fixed indent + N levels, or newline + align past preceding word?
Choice: Lists , i.e. end-of-line comma becomes , + newline or newline + , or newline + , + space?
Choice: Lists have similar parts align?
Choice: Lists ( i.e. initial left parenthesis becomes ( or newline + ( or newline + ( + newline?
Choice: Lists ) i.e. final right parenthesis becomes ) or newline + )?
For this section, list means: a comma-delimited list of columns or expressions. Examples of lists are SELECT expression-list FROM ..., INSERT INTO t (column-list) VALUES (expression-list), IN (expression-list), CREATE TABLE ... (table-element-list) routine-call (argument-list), routine-definition (parameter-list). I have seen formatters that have separate options for each type of list, but none of the prescriptive guides recommend that. I will also discuss CREATE TABLE ... (table-element-list) in a later section.
SQL people are used to seeing tables. Tables have rows and columns. Therefore SQL people are used to seeing items on separate lines, with the start of each item aligned. Therefore in SQL this is okay even if other languages do not do it. Sometimes a list which is vertical is called a "stack" or "stacked list".
Suppose that we're looking at
SELECT DISTINCT alpha + 100 AS c1 /* one */, beta AS c200 /* two */, epsilon - 2 AS c3.
Set maximum number of items per line = 1. That means that for every item there is a newline:
SELECT DISTINCT alpha + 1 AS c1 /* one */, beta AS c200 /* two */, epsilon - 2 AS c3I have seen formatters that allow for setting N to a number greater than 1, although the option is not in prescriptive guides.
Set start with = newline. That means that there is a newline before the first item:
SELECT DISTINCT alpha + 100 AS c1 /* one */, beta AS c200 /* two */, epsilon - 2 AS c3
Set comma becomes newline + comma + space. That means the commas come at the start rather than at the end:
SELECT DISTINCT alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3Arguments that I have seen for comma-at-start style include:
Set have similar parts align = yes. That means not only the item starts, but also the operators and ASes and comments align:
SELECT DISTINCT alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3Again, I have seen formatters that allow for this, although the option is not in prescriptive guides (except for other statements).
Now put the list inside parentheses.
Set ( = newline + ( + newline. That means we go now have a ( on its own line aligned with the commas:
SELECT DISTINCT ( alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3)
Set ) = newline + ) That means the parentheses are now aligned:
SELECT DISTINCT ( alpha + 100 AS c1 /* one */ , beta AS c200 /* two */ , epsilon - 2 AS c3 )
Thus there are only 6 basic "choices" but some of the choices have more than 2 possible settings, and the choices can be combined independently, and sometimes people will want to have different settings for each type of list. So there are hundreds of possible settings.
Prescriptive guides:
Salvisberg, Taranov say: commas-at-start = yes. And Salvisberg says: space-after-the-comma = yes.
Celko, Mazur, Holywell say: commas-at-start = no.
Salvisberg says "operators aligned" -- perhaps he means assignment operators.
Vendor manuals:
Oracle: comma-lists mostly on same line but indenting when line too long, comma-at-end, no AS, ) at end
DB2: keywords right aligned = no (2 spaces) and comma-lists mostly on same line
SQL Server: comma-lists mostly on same line
SQL Developer has an option "Before comma" which is equivalent to "Commas first = yes".
Bloggers / Others:
Joe Celko's programming style" says: "“Put commas at the end of a line, not the start. A comma, semicolon, question mark, or periods are visual signals that something has just ended, not that it is starting. Having a comma at the start of a line will make the eye tick leftward as it looks for that missing word that was expected before the comma.”
Simon Holywell says: "There’s no way of being polite here - this [i.e. comma-at-start = yes] looks hideous, weird and totally unconventional in a bad way."
ApexSQL says: "However, most often commas as list-separators are written at the beginning of a line to make commenting of list members easier in development. This is also a standard in Adventureworks2012 database:"
Factor says: "You'd probably also want to insist on a new line after each column definition." (but in SELECT etc.) "Now, no typesetter would agree to arrange this [list of my favourite cheeses] in a vertical list, because the page would contain too much white space... ... ...and they'd be most unlikely to want to put commas at the beginning of list elements. However, if the list elements consisted of longer strings, then it would be perfectly acceptable. In the same way, the rules for formatting SQL have to take into account the type of SQL statement being formatted, and the average length of each list element ... Commas, used as list separators, are often put at the beginning of lines. I realize that it makes the "commenting out" of list members easier during development, but it makes it difficult for those of us who are used to reading English text in books. Commas come at the end of phrases, with no space before them, but if they are followed by a word or phrase on the same line, then there is a space after the comma."
Factor also says: "Put a line-break between list items only when each list item averages more than thirty or so characters."
Re parentheses, see Firefox Data Documentation SQL Style Guide re parentheses.
Oracle SQL Developer has multiple options which would especially affect select lists and UPDATE ... SET lists and WHERE lists with multiple conditions: "Align Equal signs (= < > ...)", "Align Operator signs (* - + ...)", "Align AS keyword", "Align on commas", "Align line comments [i.e. simple comments]", "Align || at end of line", "Align variable declaration for stored procedures". Taken together, these options all correspond to "Choices" and presumably a procedure would look like this if they were all on:
CREATE PROCEDURE p ( first_parameter CHAR(1) := 'a'; second_parameter CHAR(22) := 'b'; ) INSERT INTO t1 SELECT first_column AS first_alias, second_column AS b , -- comment really_big_column AS c -- comment FROM t2 WHERE first_column = 12345 -- comment AND second_column <= 0 -- comment ORDER BY fifth_column || sixth_column ...
Bloggers:
Benenson says: SELECT goes on its own line".
From an explanation on
support.dbvis.com
"
1) The formatter indents the column names, ignoring the commas. So
with an indentation size of 4, the column name is indented to start in
column 4. If you have also selected Line Break Before Comma, the
comma will be appear before the column name. This is the intended
behavior.
"
Thus: with indenting or aligning, plus commas before, you are not trying to line up
the commas, you are trying to line up the contents.
The formatter comes from
SQLinForm.
Oracle SQL Developer had an option "For "Number of columns per line"
Toad had an option for "Stacked on more than N" (default 3), see here and here.
Oracle PLSQL Coding and Naming Standards says "Parameters must be stacked unless less than three."
... These would all correspond with "Choice: Lists maximum number of items per line = 3".
Choice: same as for Format lists?
or
change AND/OR to newline + indent+1 + AND/OR
or
change AND/OR to newline + indent+0 + AND/OR
or
change AND/OR to newline + right-align + AND/OR
or
change AND/OR to AND/OR + newline + indent-+0
Conditions are really just expressions that can return boolean values, so they can appear anywhere, not just after WHERE or HAVING or ON or IF or WHERE. But here I will illustrate with WHERE.
For simple conditions without AND/OR most examples show: WHERE condition.
The "same as for Format lists" setting would mean that a a series of conditions can be regarded as a list, with each AND/OR condition being one "item" of the list. Potentially that would imply that a string of conditions would have all their parts aligned, a setting that we might also see for CASE operations, later.
The "change AND/OR to AND/OR + newline + indent-+0" setting means AND/OR go to the end of a line, not the start of the next line. This would be compatible with the idea that AND/OR are not "clause starts", they are "operators". And operators belong at the end of a line, if we follow Java conventions.
Prescriptive guides:
Extract from a Taranov example:
INNER JOIN dbo.Table3 AS t2 ON t1.Value1 = t2.Value1 WHERE t1.Value1 > 1 AND t2.Value2 >= 101This fits with "clause starts on the left, clause ends on the right", and "change AND/OR to newline + right-align + AND/OR"
Holywell:
WHERE ... OR ... OR ...This fits with "same as for Formats list", "similar parts aligned", and "change AND/OR to newline + indent+1 + AND/OR".
Mazur fits with "change AND/OR to AND/OR + newline + indent+0" (i.e. indent+1 from the where)
where ... and ...Thus Mazur is the only one who says AND/OR-at-end. "all conditions within WHERE are indented (4 spaces), AND/OR is at end of line"
Benenson fits with "change AND/OR to newline + indent+0 + AND/OR"
WHERE ... AND ...
Salvisberg, Holywell fit with: "change AND/OR to newline + right-align + AND/OR" (i.e. right-align from the where)
WHERE ... AND ... WHERE ... OR ...
Vendor manuals:
Example from Oracle:
WHERE NOT (job_id = 'PU_CLERK' AND department_id = 30)
Example from SQL Server:"
WHERE EmployeeKey <= 500 AND LastName LIKE '%Smi%' AND FirstName LIKE '%A%'; Examples from DB2: WHERE WORKDEPT = 'E11' AND NOT JOB = 'ANALYST' + WHERE EDLEVEL > 12 AND (WORKDEPT = 'E11' OR WORKDEPT = 'E21')... Quick summary: none of the vendor manuals' examples follow any special rules.
SQL Developer has an option for "Indent AND/OR" (they mean one additional fixed-units indent so AND/OR is one level after WHERE).
Bloggers:
From an article in Oracle Magazine:
WHERE ... AND ...;This fits with "same as for Formats list", "similar parts aligned", and "change AND/OR to newline + indent+1 + AND/OR".
Choice: change subquery start to subquery start, or newline + indent-+1 + subquery start
The "subquery start" is ( or ALL/IN/ANY/EXISTS ( or SELECT. The Choices for ()s are the same as in section Format lists; for examples I chose the newline settings.
If "change subquery start to newline + indent-+1 + subquery start"
WHERE ( SELECT ... )
If "change SELECT to newline + indent+1 + SELECT + newline" and "change FROM to newline + indent+2 + FROM".
CREATE VIEW v AS SELECT c FROM t;Notice that the FROM is indented 4 spaces under the SELECT, not 4 spaces under the CREATE, because sub-clauses are always indented from the main clause, not from the statement start.
Prescriptive guides:
Salvisberg gives this example:
SELECT emp.last_name ,emp.first_name FROM employees emp WHERE emp.employee_id IN (SELECT j.employee_id FROM jobs j WHERE j.job_title like '%Manager%');
Benenson says: (subquery-is-clause-start: yes)
Choice: none, setting is as described in section Format parentheses.
In section Format subqueries the Choices involved putting ) at the end of a SELECT without a newline, or aligning ) with (. But in examples for WITH I regularly see a third way: align ( with WITH. For example (this is from a mozilla.org SQL Style Guide)
WITH sample AS ( SELECT client_id, FROM main_summary WHERE sample_id = '42' )
Prescriptive guides:
Taranov, Mazur, Benenson examples all show Align right parenthesis with WITH = yes.
Choice: change JOIN to newline + indent+1 + JOIN, or newline + indent+0 + JOIN?
Choice: change ON to ON, or newline + indent+1 + ON?
Choice: old style join like JOIN?
JOIN is a sub-clause of FROM, so "indent+1" means "indent one level after FROM", which may or may not be the same thing as "indent one level after SELECT". ON is a sub-clause of JOIN, so "indent+1" means indent one level from JOIN".
"old style join like JOIN = yes" would suggest that however you would format "CROSS JOIN" is how you would format ",". But the vendor manual examples which have old style joins (Oracle, DB2, Tarantool) have them as comma-separated lists ("PARTS, PRODUCTS", "employees, departments", "t1 AS a, t1 AS b").
Prescriptive guides:
Taranov shows: change JOIN to newline + right-align word, change ON to newline + right-align phrase + ON
SELECT INNER JOIN ... ON ...
Holywell shows: change JOIN to newline + indent+1 + JOIN, change ON to newline + indent+0 + ON
FROM ... JOIN ... ON ...Holywell explains: "Joins should be indented to the other side of the river and grouped with a new line where necessary." But in another example Holywell shows JOIN and ON right-aligned with SELECT and FROM.
Salvisberg shows: change JOIN to newline + indent+1 + JOIN, + names align (as if it is a list) and change ON to ON
FROM ... JOIN ... ON ...
Mazur, Benenson, Donnelly say or show: change JOIN to newline + indent+0 + JOIN, change ON to ON
FROM ... JOIN ... ON ...This is also the style of blogger James Thigpen. Mazur suggests that ON might be on a different line if the condition is complex.
Examples in vendor manuals:
SQL Developer has an option for "JOIN statements" (they mean JOIN clauses).
Choices: none needed.
If there is a list of target columns, or if there is a list of VALUES columns, then see Format lists. If there is a VALUES or SELECT clause, then see Format clauses. So the Choices are already made, there is nothing to do now except give an example. This is one from Salvisberg, who believes that the list of target columns should always be specified, and consistently chooses the setting that puts commas in front, and believes that a clause start should be changed to newline + indent+1 + clause start.
INSERT INTO departments (department_id ,department_name ,manager_id ,location_id) VALUES (departments_seq.nextval ,'Support' ,100 ,10);
Choices: none needed
As with INSERT, the Choices are already made, there is nothing to do now except give an example. This one is from Holywell, who believes clause starts should be right-aligned and commas in comma lists should be at end.
/* Updating the file record after writing to the file */ UPDATE file_system SET file_modified_date = '1980-02-22 13:19:01.00000', file_size = 209732 WHERE file_name = '.vimrc';
Choice: see Format lists.
The table-element list in CREATE TABLE is a list.
Therefore the same 6 choices apply, although you might choose different settings.
This is what a CREATE TABLE will look like if the choices are:
Lists maximum number of items per line = 2
Lists start with: newline + usual fixed indent + 1 level
Lists , i.e. end-of-line comma becomes , + newline
Lists have similar parts align = yes
Lists ( becomes (
Lists ) becomes )
Alternate phrasing: Inside CREATE TABLE any list of columns or constraints is a comma-delimited list so the "Format lists" rules apply.
CREATE TABLE employees (id INTEGER PRIMARY KEY, class INTEGER, name CHAR(255) UNIQUE DEFAULT 5, markup INTEGER);
Probably "items per line = 2" is not an optimal choice; recommendations are either 1 or infinite. The "similar parts align" works well because the parts (name, data type, column constraint, default) are well defined and ordered, but would look less lovely if the DEFAULT clause was long and came before UNIQUE. And, although saying ") becomes )" saves screen space, it is not the most common choice. So people may be more likely to choose "items per line = 1" and "similar parts align = no" and ) becomes newline + ) which results in
CREATE TABLE employees (id INTEGER PRIMARY KEY, class INTEGER, name CHAR(255) UNIQUE DEFAULT 5, markup INTEGER );If you decide that you like aligning parentheses, then you will probably want to be consistent and align CASE with END, or BEGIN with END.
Choice: table constraints come after column definitions?
or Choice: column constraint immediately after column data type?
I can imagine it is good to add "UNIQUE (class, name)" as a final item on a separate line, so that it does not interrupt the column list. But I can imagine it is good to put "UNIQUE (class, name)" after the definition of name and class. (Remember that NOT NULL is a column constraint and it comes immediately after the data type, so why not other constraints?)
If there is a big long foreign-key constraint, it might be broken up because the earlier "Choices" determined that column lists get indented with fixed indenting, aligned parentheses, etc. For example
CONSTRAINT constraint_name FOREIGN KEY REFERENCES table_name (column_name, column_name )
If there is a long check constraint, it might be broken up because the earlier "Indent clauses" choices decided what should happen with AND / OR indenting.
Prescriptive guides:
Holywell says: "If it make senses to do so align each aspect of the query on the same character position. For example all NOT NULL definitions could start at the same character position. This is not hard and fast, but it certainly makes the code much easier to scan and read."
Holywell example:
CREATE TABLE staff ( PRIMARY KEY (staff_num), staff_num INT(5) NOT NULL, first_name VARCHAR(100) NOT NULL, pens_in_drawer INT(2) NOT NULL, CONSTRAINT pens_in_drawer_range CHECK(pens_in_drawer >= 1 AND pens_in_drawer < 100) );This example also is consistent with Align right parenthesis with statement start = yes. + column constraint immediately after column data type = yes
Examples from vendor manuals
CREATE TABLE EMPLOYEE_SALARY (DEPTNO CHAR(3) NOT NULL, DEPTNAME VARCHAR(36) NOT NULL, EMPNO CHAR(6) NOT NULL, SALARY DECIMAL(9,2) NOT NULL WITH DEFAULT)Choices: no semicolon, everything upper case, INTEGER rather than INT, adjectives left-aligned, commas follow.
CREATE TABLE dbo.PurchaseOrderDetail ( PurchaseOrderID int NOT NULL REFERENCES Purchasing.PurchaseOrderHeader(PurchaseOrderID), LineNumber smallint NOT NULL, ProductID int NULL REFERENCES Production.Product(ProductID), UnitPrice money NULL, ... )Notice: semicolons, keywords upper case, object names Pascal Case, adjectives not aligned, commas follow, INT preference. + REFERENCES (because it is a constraint?) is newline + more indent
CREATE TABLE departments_demo ( department_id NUMBER(4) , department_name VARCHAR2(30) CONSTRAINT dept_name_nn NOT NULL , manager_id NUMBER(6) , location_id NUMBER(4) , dn VARCHAR2(300) ) ;Notice: semicolons at end, keywords upper case, object names snake case, adjectives left-aligned, commas precede. + space before semicolon + column constraint immediately after column data type
Another Oracle CREATE TABLE example:
CREATE TABLE books (title VARCHAR2(100), author person_t);
Tarantool:
CREATE TABLE employees_demo (employee_id INTEGER, first_name STRING, last_name STRING NOT NULL, email STRING NOT NULL, phone_number STRING);
SQL Standard 9075-12 document:
CREATE TABLE INFORMATION_SCHEMA_CATALOG_NAME ( CATALOG_NAME SQL_IDENTIFIER, CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_PRIMARY_KEY PRIMARY KEY ( CATALOG_NAME ), CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_CHECK CHECK ( 1 = ( SELECT COUNT(*) FROM INFORMATION_SCHEMA_CATALOG_NAME ) ), CONSTRAINT INFORMATION_SCHEMA_CATALOG_NAME_FK_CATALOG_NAME FOREIGN KEY ( CATALOG_NAME ) REFERENCES DEFINITION_SCHEMA.CATALOG_NAME );Final ) is on a separate line aligned with ), space after (, space before ) -- those are choices in Section Format parentheses. But for the CHECK clause ) ) is on the last line of the subquery -- that is a choice in Section subqueries. All words upper case -- those are choices in Names sections.
Choice: AS SELECT becomes AS SELECT, or newline + AS SELECT, or AS + newline + fixed-indent + SELECT?
Choice: WITH CHECK OPTION aligns with AS?
The "AS SELECT" setting looks like:
CREATE VIEW view_name AS SELECTThe "newline + AS SELECT" setting looks like
CREATE VIEW view_name AS SELECTThe "AS + newline + fixed-indent + SELECT" setting looks like
CREATE VIEW view_name AS SELECTI cannot think of any principle that applies, except perhaps that AS is a special clause. None of the prescriptive guides says anything about this, so I depend entirely on examples from vendors and blogs.
In this list, some vendors appear more than once because they use more than one setting.
AS SELECT: nobody
newline + AS SELECT: DB2, Celko
newline + AS + fixed-indent + SELECT: nobody
AS + newline + fixed-indent + SELECT: Oracle
newline + AS + newline + SELECT: SQL Server, Factor
newline + fixed-indent + AS SELECT: Oracle, DB2
Vendors:
Oracle: see here. Inconsistent. WITH READ ONLY aligns with AS.
DB2: see here.
SQL Server: see here. Consistent. WITH CHECK OPTION aligns with AS.
Blogs:
Joe Celko, SQL for Smarties: Advanced SQL Programming shows WITH CHECK OPTION is level-1 indent from CREATE.
Sharad Maheshwari and Ruchin Jain, "DBMS – Complete Practical Approach" say: "Everything after the AS keyword must be enclosed in parentheses."
The Teradata manual has
CREATE VIEW dept AS SELECT deptno(TITLE 'Department Number'), deptname(TITLE 'Department Name'), loc (TITLE 'Department Location'), mgrno(TITLE 'Manager Number') FROM department;which is consistent with "AS + newline + fixed-indent + SELECT".
Choice: indent first line of routine?
Choice: AS as in CREATE VIEW?
"indent first line of routine" is a Choice because I have seen it occasionally. But zero prescriptive guides recommend it and zero vendor examples show it. I believe that there is an influence from other languages that have:
function_name_definition () { statements; }The equivalent of { ... } is BEGIN ... END, so:
create_routine_statement_start () BEGIN ... END;I call that "the usual way".
The parameter list might be followed by a descriptive phrase, for which I know of no rule. I have no guidance for whether LANGUAGE or other characteristics phrases are clause starts.
For "AS as in CREATE VIEW" I chose not to repeat all the Choices in Format CREATE VIEW. I have assumed that, whatever you like for newlines and indenting there, you will like here too. (This applies only for the dialects that have CREATE ... AS ... or CREATE ... IS ...).
Vendor examples:
Oracle: the usual way.
DB2: inconsistent.
SQL Server: the usual way, with newlines before and after AS.
MySQL: the usual way.
MariaDB: inconsistent.
Bloggers:
Factor: (in Chapter 1 "Writing Readable SQL", Listing 1-7) shows the usual way, with newlines before and after AS.
Choice: change create-trigger clause to newline + indent+0 + create-trigger clause? This is not really different from the Choice for Format CREATE PROCEDURE or CREATE FUNCTION. We have an initial CREATE ... object_name, some modifiers, and then a triggered-action statement which is usually a compound BEGIN/END statement.
According to our guidance via the SQL standard BNF, ON should be treated as a clause start. However, the only prescriptive guide that has a CREATE TRIGGER example (Salvisberg) shows otherwise. And the vendor manual examples are inconsistent.
Prescriptive guides:
Salvisberg shows: change create-trigger clause to newline + indent+0 + create-trigger clause = yes:
CREATE OR REPLACE TRIGGER dept_br_u BEFORE UPDATE ON departments FOR EACH ROW BEGIN END;/
Vendor manual examples:
Oracle shows BEFORE and ON as clauses. So is REFERENCING. So is FOR EACH ROW.
DB2 shows AFTER and FOR EACH ROW and WHEN are clauses. Indents the BEGIN/END! Sometimes indents twice!
SQL Server shows ON and FOR are clauses.
Mysql shows BEFORE is BEFORE. FOR EACH ROW is newline + indent+0 + FOR EACH ROW. Or BEFORE is newline + indent+1 + BEFORE. FOR EACH ROW is newline + indent+1 + FOR EACH ROW. inconsistent.
MariaDB shows indent+1 for AFTER, indent+2 for UPDATE.
Tarantool shows BEFORE and FOR EACH ROW are not indented. BEGIN is indented!
Choice: change WHEN/END to newline + indent+0 + WHEN/END (i.e. they left-align with CASE)?
Choice: change WHEN/END to newline + indent+1 + WHEN/END?
Choice: change THEN to THEN, or to newline + indent+2 + THEN?
Choice: Change END to END, or to newline + indent+0 + END?
The indenting is relative to CASE.
CASE ... END is analogous to a statement. So whenever a prescriptive guide says indent=yes or left-align=yes for a statement, I would expect it to say something similar for CASE ... END. So in a consistent world there would be no need for this "Format CASE expressions" section, but I have to add it because prescriptive guides and formatters have options for it.
If there is an AS alias in a select list, it follows END on the same line.
I have not seen suggestions that a series of WHEN ... THEN ... should be treated as a "list" and therefore be aligned as in Format lists. I have, however, seen a suggestion that CASE expressions should be treated as items in a list, for example
SELECT '' AS a, CASE WHEN END AS b, '' AS c ...This could be stated for subqueries too.
Prescriptive guides:
Taranov shows: change WHEN to WHEN, change THEN to newline + indent+0 (align with WHEN), END aligns with CASE
CASE WHEN ... THEN ... WHEN ... THEN ... END
Holywell shows: change WHEN/END to newline + indent+0 + WHEN/END = yes: and THEN on the same line
CASE ... WHEN ... THEN ... WHEN ... THEN ... END
Benenson says (re CASE statements) "try to align WHEN, THEN, and ELSE together inside CASE and END"
CASE WHEN ... THEN ... ELSE ... ENDNot quite like Taranov because ELSE is elsewhere.
Mazur, Salvisberg show: change WHEN/ELSE to newline + indent+1 + WHEN/ELSE = yes: and THEN on the same line
CASE ... WHEN ... THEN ... WHEN ... THEN ... ELSE ... ENDSalvisberg adds "Try to use CASE rather than an IF statement with multiple ELSIF paths." Mazur adds "Each
when
should be on its own line (nothing on the case line)
and should be indented one level deeper than the case line.
The then
can be on the same line or on its own line below it, just aim to be consistent."
Vendor examples:
Oracle: inconsistent
DB2: same as Mazur, Salvisberg
SQL Server: same as Mazur, Salvisberg
MySQL: inconsistent
MariaDB: single line
Tarantool: single line
Oracle SQL Developer has separate options for indent of CASE, WHEN, THEN, and AND/OR.
Bloggers:
Toad formatter has an option "Position THEN on the same line".
Factor shows: change WHEN/ELSE to newline + indent+1 + WHEN/ELSE = yes, change THEN to newline + indent+2 + THEN
CASE ... WHEN ... THEN ... WHEN ... THEN ... END
SQL Prompt has options for placing WHEN / ELSE / THEN on new lines, and for aligning ELSE with WHEN.
Holywell example:
SELECT CASE postcode WHEN 'BN1' THEN 'Brighton' WHEN 'EH1' THEN 'Edinburgh' END AS city FROM office_locations WHERE country = 'United Kingdom' AND opening_time BETWEEN 8 AND 9 AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1');... It looks like "right align word" but then there is no river. Or, there are two rivers.
Choice: Format blocks the usual way?
For this purpose a "block" is a group of statements enclosed by a "control statement".
In standard SQL (SQL/PSM) the control statements that enclose statements are: BEGIN ... END, CASE ... END CASE, IF ... END IF, LOOP ... END LOOP, WHILE ... END WHILE, REPEAT ... END REPEAT, FOR ... END FOR. Naturally in Oracle or Oracle imitators (PL/SQL) the list differs, but the effect of saying "yes" is the same: Any statements between the verb and the END are indented N spaces. The END aligns with the verb. WHEN and ELSE are treated as control verbs. IF condition THEN and WHEN condition THEN are single lines. Example (with indent-amount = 2):
BEGIN sql-statement; LOOP sql-statement; IF condition THEN sql-statement; ELSE sql-statement; END IF; END LOOP; END;That is what I have seen in every source that has a consistent set of examples. Therefore I think it is okay to say that is the "usual" way.
Choice: labels on same line as control verb?
If labels on same line = yes, using an SQL/PSM label for the example:
BEGIN loop_label: LOOP sql-statement; END LOOP loop_label; END;If labels on same line = no, using a PL/SQL label for the example:
BEGIN <<loop_label>> LOOP sql-statement; END LOOP loop_label; END;Notice, in the first example, how END aligns with loop_label not with LOOP. I do not know whether this is common, but it is in the MySQL manual.
Choice: FOR ... LOOP is all one line: yes or no.
If all one line = yes:
FOR iteration LOOP
If all one line = no:
FOR iteration
LOOP
That is the only example that I could find of a difference between
Salvisberg's recommendations and the Oracle manual.
Choice: a short IF ... END IF; can go on one line: yes or no
This is about whether the "usual" form is followed rigidly for blocks,
even if for non-blocks it is okay to avoid "haiku".
Remember: Modifiers of verbs go with the verb (see definition of "phrase"), so BEGIN ATOMIC has ATOMIC on the same line as BEGIN. And DECLARE EXIT HANDLER FOR NOT FOUND is a single line.
Prescriptive guides:
Most of the prescriptive guides do not have anything to say about control statements.
Salvisberg says: indent blocks the usual way: yes. labels on same line: no. FOR ... LOOP is all one line: no. Salvisberg also advises: avoid goto because there is no indenting that can indicate its effect on the flow, and always have labels for loops. Example:
<<while_loop>> WHILE (i <= co_max_value) LOOP i := i + co_increment; END LOOP while_loop;... So different line for LOOP, and label has its own line.
Vendor manual examples:
Oracle: indent blocks the usual way: yes. labels on same line: no. FOR ... LOOP is all one line: yes. a short IF ... END IF; can go one line: no
DB2: THEN is on same line as IF: yes. a short IF ... END IF; can go one line: no
SQL Server: the examples given on this page are so wildly inconsistent that we'd have to say: there is no rule. a short IF ... END IF; can go one line: no
MySQL: indent blocks the usual way: yes. labels on same line: yes. FOR ... LOOP is all one line: n/a. a short IF ... END IF; can go one line: yes
MariaDB: not consistent, ignored
Oracle SQL Developer has an option "After statements" and one of the options is "double break", that is, put two newlines after each statement.
Choice: Format blocks as you do for Format lists?
The previous section was about formatting blocks the usual way. The reason you might want indenting some other way would, I suppose, be: "because that is what we do in [insert language name here]".
I will illustrate WHILE in SQL/PSM blocks but the same considerations are applicable for other conditional statements and for PL/SQL.
Once again, we can do "Choice re-use".
Start with the six Choices at the beginning of section Format lists.
Substitute "statement" for "item", substitute ";" for ",", substitute "DO" for "(", substitute "END" for ")".
Now we have:
Choice: Lists maximum number of statements per line = N?
Choice: Lists start with: space, or newline + usual fixed indent + N levels, or newline + align past preceding word?
Choice: Lists ; i.e. end-of-line semicolon becomes ; + newline or newline + ; or newline + ; + space?
Choice: Lists have similar parts align?
Choice: Lists DO i.e. initial left parenthesis becomes DO or newline + DO or newline + DO + newline?
Choice: Lists END i.e. final right parenthesis becomes END or newline + END?
For example, if
maximum number of statements per line = 1,
lists start with usual fixed indent + 1 level,
end-of-line semicolon becomes ; + newline,
similar parts align = no,
DO becomes newline + DO + newline,
END becomes END,
we get:
WHILE condition DO
statement;
statement; END WHILE;
Now that we are talking about SQL as a procedural language, it is more legitimate to compare with indentation styles in other languages. The functional equivalent for SQL block begin ... end is usually { ... } braces. So it is useful to look at the Wikipedia article Indentation style section "Brace placement in compound statements".
The following is an SQL variation of the table in that section. I am assuming indentation with 4 spaces, and space-before-semicolon = no.
DO ... END placement Style -------------------- ----- WHILE condition DO K&R and variants statement; statement; END WHILE; WHILE condition Allman DO statement; statement; END WHILE; WHILE condition GNU DO statements END WHILE; WHILE condition Whitesmiths DO statements END WHILE; WHILE condition Horstmann + Pico DO statement; statement; END WHILE; WHILE condition DO Ratliff statement; statement; END WHILE; WHILE condition Lisp DO statement; statement; END WHILE; WHILE condition Haskell DO statement ; statement ; END WHILE
Choices: none needed
As with INSERT UPDATE etc., the Choices are already made, there is
nothing to do now except give an example. This one is
from Taranov, who believes definitions should be aligned
(as in Format CREATE TABLE), and data types should be lower case
(as in Data types).
DECLARE @myGoodVarchareVariable varchar(50);
DECLARE @myGoodNVarchareVariable nvarchar(90);
DECLARE @myGoodCharVariable char(7);
DECLARE @myGoodNCharVariable nchar(10);
If there was a default then it would not cause a newline, because DEFAULT was not established to be a clause start according to any of the criteria shown in previous sections.
If there was a multiple-column declaration then the Choice that would apply is as in Format lists.
Choice: if overflow ignore Maximum line length, or word wrap, or format = no.
An "overflow" occurs when a line of SQL text can go past the Right margin.
The previous sections have shown many ways to format an SQL statement onto multiple lines. Therefore overflow should be rare.
But -- hopefully very rarely -- a single clause or a single list item will cause overflow past the maximum length. What then?
Setting "ignore Maximum line length" means "go ahead and overflow". It is not a big deal if the line is on a screen, just result will be a scroll bar or a return to the leftmost column position.
Setting "word wrap" means what in CSS would be called word-wrap: normal or in my terminology it means "if the next word would overflow, newline and indent 0".
Setting "format = no", that is, "do not try to format a statement that would cause overflow", might help readability if the original format contained newlines.
Since "format = no" implies that you will have to figure out something readable by yourself,
the
Java code conventions
might help because the "general principles" are applicable to SQL if the too-long line is due to a single expression. They are:
"When an expression will not fit on a single line, break it according to these general principles:
•Break after a comma.
•Break before an operator.
•Prefer higher-level breaks to lower-level breaks.
•Align the new line with the beginning of the expression at the same level on the previous line.
•If the above rules lead to confusing code or to code that’s squished up against the right margin, just indent 8 spaces instead."
Bloggers:
Commonwealth of Pennsylvania Department of Public Welfare, "Oracle PLSQL Coding and Naming Standards", says "It is allowed to overshoot right margin if code cannot be accommodated in 78 characters."
If you have a good GUI client, it will add colour highlighting on your screen.
If you have a good formatter, it will automatically take care of some of the Choices that I raised here. If your group uses Toad, SQL Developer, SQL Server Management Studio, SQLinForm, etc., just accept what they offer.
Tao Klerks and Steve Culshaw have come up with a github.com page "SQL Formatter comparisons" which lists more formatters than I dreamed existed, and some of them are open source, and some of them are online.
Google "sql lint" and you will see a few more products that will advise but not change.
Bloggers:
Lester I. McCann (writing about C) says: "What about programs that will automatically indent source code for you? Such "crutches" do exist, but I discourage their use on code that is being written from scratch because you are likely to develop bad habits if you begin to rely on the program to fix your mistakes for you."
The final section of this guide is a copy of the GNU General Public License (GPL). It is possible to use GPL for other than software, as explained at the gnu.org FAQ. FLOSS manuals are under GPL. So is this document.
All references to "this program" or "software" mean "this guide", the Descriptive SQL Style Guide.
GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.