Month: January 2020


Convert SQL Stored Procedures to Lua Functions

I have written code that converts SQL stored procedures to Lua functions. This might interest people who want to know what’s involved for any C-like target language. Here I will display one step at a time, emphasizing the “Design Decisions” that I had to make, giving increasingly complex examples. Then I’ll note the limitations and say how to get the source code. Alpha.

Frame

CREATE LUA PROCEDURE p()
BEGIN
  /* Empty block */
END;

The word LUA is just a signal that this is a translation job; everything following it is expected to be standard SQL or a supported dialect.

I already had a good
recognizer so parsing the input is easy. And, so far, it’s easy to find Lua syntax analogous to SQL syntax. The result is:

function P()
do 
  --[[/* Empty block */]]
end; 
end;

Design Decision #1: I’ll try to preserve comments and punctuation as in the original, although it’s not necessary.

Design Decision #2: SQL’s BEGIN … END can become Lua’s do … end
The alternative was Lua’s while true … break; end but that could have confused some people.

Variables

CREATE LUA PROCEDURE p()
BEGIN
  DECLARE i INTEGER;
  DECLARE "Peter's string" VARCHAR(4);
END;

DECLARE + variable-name is easy because Lua declarations only need a name. Occasionally we have to do some extra processing though.

Result:

do 
 local I;
 local Peter_39_s_32_string_33_;
end; 
end;

Design Decision #3: I convert regular identifiers to upper case because that’s how SQL does things. This isn’t common practice in Lua.

Design Decision #4: I change delimited (quoted) identifiers by changing all special characters to _s but keeping the original case, and then we add a number to ensure the name is unique. Detail: _33 means “the definition was at token number 33 in the CREATE PROCEDURE statement; I use token numbers frequently to make sure I’m making something unique.

Design Decision #5: I don’t preserve the data type. Lua doesn’t need it often and when it does I can use Lua’s type() function to see whether somethingis a number or a string … except if the value is nil. Nil values will confuse us and users should try to avoid them. The alternative, though, was to store the data type and null status separately for every declared variable, and I think (well, “hope”) that won’t be necessary in practice.

Executing

CREATE LUA FUNCTION f() RETURNS INTEGER
BEGIN
  DECLARE i INTEGER DEFAULT 1;
  RETURN i + 3;
END;

Now we get into SQL statements that must be called from Lua. For this we will use a function named sql_execute(), which is a wrapper around code that is specific to a particular DBMS. The job of sql_execute is to:
Pass parameters.
Send a statement to the DBMS server (for example with mysql_real_query if the DBMS is MySQL/MariaDB).
NB: sending should be done with Lua pcall because results must always be intercepted.
Check return values:
(If there was an error) set sqlstate to error, for example ‘45000’, and return false.
(If there was a result set)
Copy the result set to sqlresult.
(If the result set is empty) set sqlstate to not found, for example ‘02000’, and return nil.
(If the result set is not empty) set sqlstate to okay, for example ‘00000’, and return true.
(If there was no result set) set sqlstate to okay, for example ‘00000’, and return true.
The job of the function calling sql_execute is to:
Define sqlstate (a string) and define sqlresult (a table) accessible throughout the function.
Call sql_execute with whatever parameters are necessary.
Check what sql_execute returns, if necessary.

Result:

function F() 
            local sqlstate;
            local sqlresult = {};
            local sqlmessage;
            local function sql_execute(statement, parameters)
                --[[not illustrated here, see description]]
                end;
  
do 
  local I; sql_execute([[SELECT 1;
  ]],{}); I = sqlresult[1][1];
  sql_execute([[
  SELECT ? + 3;
  ]],{I}); return sqlresult[1][1];
end; ::end_8:: 
end;

For the DEFAULT 1 clause, the function will call on the DBMS to execute “SELECT 1;”, and the result will be in the first column of the first row of sqlresult table, so I = select-result. There are no parameters in this case, which is why ,{} appears.

For the RETURN i + 3 statement, the function will call on the DBMS to execute “SELECT ? + 3;”, and since the ? is a placeholder for a parameter there will have to be a parameter, which is why ,{I} appears. Once again sql_execute() will put the result set, which happens to be one row with one column, in sqlresult. So return sqlresult[1][1] will cause the function to return 4.

Design Decision #6: all expressions will go to SQL for evaluation. Certainly for the example here it would be possible to do the addition in Lua, but when expressions contain SQL functions or operators we have to get SQL to do it. So, for simplicity, I always ask SQL to do it.

Design Decision #7: I’ll assume that sql_execute can’t fail for statements that are assigning, like DEFAULT and RETURN. This isn’t necessarily true, but I figured: if there was something wrong with the expression, it would have failed already.

Getting Out of a block

CREATE LUA PROCEDURE p()
BEGIN
  BEGIN
    CREATE TABLE t (s1 VARCHAR(4) PRIMARY KEY);
    INSERT INTO t VALUES ('a');
    INSERT INTO t VALUES ('b');
  END;
END;

Adding statements in a BEGIN … END causes the usual sql_execute() calls, but also some goto instructions.

Result:

function P()
local sqlstate;
local sqlresult = {};
local sqlmessage;
local function sql_execute(statement, parameters)
    --[[See earlier description.]]
    end;
do 
  do 
    sql_execute([[
    CREATE TABLE t (s1 VARCHAR(4) PRIMARY KEY);
    ]],{});
    if string.sub(sqlstate,1,2) > '02' then  goto end_7; end;
    sql_execute([[
    INSERT INTO t VALUES ('a');
    ]],{});
    if string.sub(sqlstate,1,2) > '02' then  goto end_7; end;
    sql_execute([[
    INSERT INTO t VALUES ('b');
    ]],{});
  end; ::end_7:: 
end; ::end_6:: 
end;

The instruction

    if string.sub(sqlstate,1,2) > '02' then  goto end_7; end;

is my substitute for SQL’s implicit action
“Get out of the BEGIN … END block if there is an error”.
Remember that sql_execute() returns an sqlstate value, and the first two characters of this value must be greater than ’02’ if there is an error. So the goto will go to a point just past the end of the inner Lua do … end block, which is marked by a label ::end_7::. This isn’t necessary for INSERT INTO t VALUES (‘b’); because it is the last thing in the block, so it will pass out of the block anyway.

Design Decision #8: goto is the best way to get out of the block. The alternatives were: (a) use while true … end so that break will get out of the block, as discussed earlier, (b) put “if string.sub(sqlstate,1,2) <= '02'" before every sql_execute() call in the block. Although the alternatives would work in this example, I decided they make the code unreadable when the examples get complicated.

Unconditional Flow control

CREATE LUA PROCEDURE p()
BEGIN
  label_1:
  LOOP
    ITERATE label_1;
    LEAVE label_1;
  END LOOP;
END;

I can replace SQL’s

LOOP ... END LOOP

with Lua’s

while true ... end

. I can replace SQL’s ITERATE with a goto that goes to just before the end, so that the loop repeats. I can replace SQL’s LEAVE with a goto that goes to just after the end, so that the loop ends.

Result:

function P()
local sqlstate;
local sqlresult = {};
local sqlmessage;
local function sql_execute(statement, parameters)
    --[[See earlier description.]]
    end;
do 
  while true do 
    goto LABEL_1_1;
    goto LABEL_1_2;
  ::LABEL_1_1::end;::LABEL_1_2::
end; ::end_6:: 
end;

Since the ITERATE-derived “goto LABEL_1_1;” skips past the LEAVE-derived “goto LABEL_1_2;” this is an infinite loop. Its only saving grace is that it is valid Lua code.

Conditional Flow Control

CREATE LUA PROCEDURE p()
BEGIN
  DECLARE i INTEGER DEFAULT 1;
  WHILE i < 5 DO
    INSERT INTO t VALUES (i);
    SET i = i + 1;
  END WHILE;
END;

WHILE, IF, and REPEAT are statements that depend on a condition. I support them all the same way: by passing "SELECT condition;" to SQL, and then (since such a statement will always return one row with a Boolean value) asking whether the first column in the first row of sqlresult is true.

Result:

function P()
local sqlstate;
local sqlresult = {};
local sqlmessage;
local function sql_execute(statement, parameters)
    --[[See earlier description.]]
    end;
do 
  local I; sql_execute([[SELECT 1;
  ]],{}); I = sqlresult[1][1];
  while sql_execute([[
        SELECT ? < 5 ;
        ]],{I}) == true and sqlresult[1][1] == true do 
    sql_execute([[
    INSERT INTO t VALUES (?);
    ]],{I});
    if string.sub(sqlstate,1,2) > '02' then  goto end_6; end;
    sql_execute([[
    SELECT ? + 1;
    ]],{I})
    if string.sub(sqlstate,1,2) > '02' then  goto end_6; end;
    I = sqlresult[1][1];
  end;
end; ::end_6:: 
end;

We've already seen what the rest of the statements in this loop are supposed to generate, the only new thing is that "SELECT ? < 5;" will return true (because the passed parameter I is less than 5) until I becomes 5 (because "SELECT ? + 1;" is inside the loop).

Design Decision #9: I decided to test first if sql_execute() returns true (that is, does not return an error or not found) before checking whether the value in the result set is true. That's reasonable caution -- but I didn't decide to do a similar test for assignment. That's because a failure during condition evaluation could cause an infinite loop, so it is more serious.

Design Decision #10: I'm saying "SELECT ? < 5;" although I dislike non-standard code -- standard code is "VALUES (? < 5);". This is my concession to what seems to be popular, but I am already regretting it, I probably will change this.

Cursors

CREATE LUA FUNCTION f() RETURNS INTEGER
BEGIN
  DECLARE i INTEGER;
  BEGIN
    DECLARE c CURSOR FOR SELECT 5;
    OPEN c;
    FETCH c INTO i;
    CLOSE c;
  END;
  RETURN i;
END;

Remember that sql_execute() will put the results of a SELECT into a table named sqlresult. So all I need to do is call sql_execute("SELECT 5;") for the OPEN, pick up a value from sqlresult for the FETCH statement, and rub out sqlresult for the CLOSE statement.

Result:

function F()
local sqlstate;
local sqlresult = {};
local sqlmessage;
local function sql_execute(statement, parameters)
    --[[See earlier description.]]
    end;
do 
  local I;
  do 
    local C_CURSOR = {};
    local C_CURSOR_OFFSET = 0;
    local C_CURSOR_STATUS = 'not open';
     
    C_CURSOR = sql_execute([[
    SELECT 5;
    ]],{});
    if string.sub(sqlstate,1,2) > '02' then  goto end_13; end;
    if string.sub(sqlstate,1,2) == '00' or string.sub(sqlstate,1,2) == '02' then
    C_CURSOR_OFFSET = 0;
    C_CURSOR_STATUS = 'open';
    end;

    if C_CURSOR_STATUS ~= 'open' then sqlstate = '07000';
    else if C_CURSOR_OFFSET >= #C_CURSOR then sqlstate = '02000';
    else do
       C_CURSOR_OFFSET = C_CURSOR_OFFSET + 1;
        I = C_CURSOR[C_CURSOR_OFFSET][1];
    end; end; end;
    
    C_CURSOR= {};
    C_CURSOR_OFFSET = 0;
    C_CURSOR_STATUS = 'not open';
  end; ::end_13:: 
  if string.sub(sqlstate,1,2) > '02' then  goto end_8; end;
  sql_execute([[
  SELECT ?;
  ]],{I}); return sqlresult[1][1];
end; ::end_8:: 
end;

Here C_CURSOR is a Lua table with a copy of resultset, and C_CURSOR_OFFSET is something that FETCH can increment whenever it succeeds. FETCH will make its own decision about whether sqlstate = '02000' (which is the SQLSTATE value for NOT FOUND) by checking whether C_CURSOR_OFFSET is greater or equal to the Lua table's size.

Design Decision #11: This is a normal way to get a result set. MySQL/MariaDB users are accustomed to seeing SELECT display results if it is in a routine. I could do the same by looping through sqlresult and calling Lua's print() function, but decided that's not what everyone would expect.

Handlers

CREATE LUA PROCEDURE p()
BEGIN
  DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
    DROP TABLE t;
  SIGNAL SQLSTATE '45000'; /* SIGNAL SQLSTATE '45000'; */
END;

I am no longer looking at something easy, because Lua has no equivalent for a handler. I struggled a lot before coming up with the idea of a nested function. After that I struggled a lot more with handlers that have multiple conditions, EXIT handlers, multiple handlers that need to be sorted according to how specific they are, and handlers in blocks that encompass the block that the statement is in. Those complexities are all taken care of now, by generating complex code. Here I'll show only the simplest case.

Result:

do 
  local function handler_0()
    sql_execute([[
    DROP TABLE t;
    ]],{}); end;
  sqlstate = '45000';
  if string.sub(sqlstate,1,2) > '02' then handler_0(); end;
end; ::end_6:: 
end;

Here, afer sqlstate = '45000'; which was generated for the SIGNAL statement, we again check to see whether there was an error by looking at sqlstate, but this time, instead of jumping out of the BEGIN/END block when that happens, the function calls handler_0() which has the code generated for DECLARE CONTINUE HANDLER.

Design Decision #12: Defining a function within a function ("nesting" the function) is great, because the generated code ends up in the same relative position that it occupies in the SQL source. However, this makes me depend on a Lua feature. Also, I cannot LEAVE from the handler code into the main code. Nevertheless, the alternative -- putting an indefinite number of conditional goto statements after the handler code -- does not inspire.

Illustration

The following screenshots show a run of the last example, In the second screenshot the code of sql_execute() appears instead of the stub that I've been showing, but it only works with Tarantool. As I said earlier, users have to modify the sql_execute() code to fit their DBMS dialect.

Screenshot #1: showing entry of CREATE LUA PROCEDURE.

Screenshot #2: showing dialog box with result Lua code.

These were done with ocelotgui. I have done no retouching, which is why the indenting is a bit rough. Before entering, I had to ensure that the syntax checker was running ...
SET OCELOT_STATEMENT_SYNTAX_CHECKER = '3';
(Turning on syntax checking invokes the recognizer, without which nothing would work.)
Also, for MySQL/MariaDB only:
SET SESSION SQL_MODE='ansi_quotes';

Limitations

As I said, it's alpha.
Sure, bugs exist and I haven't tested much, because I want to get the design straight, which is why I've focused in this blog post on "Design Decisions". Some particular matters:
CASE, GET, RESIGNAL don't work.
No support for OUT or INOUT parameters.
Only SQL/PSM, not PL/SQL (ocelotgui supports PL/SQL but CREATE LUA does not.)
No handlers within handlers.
No ITERATE or LEAVE to get out of handlers.

Really, this might be useful for any DBMS, even one that doesn't support SQL/PSM, since SQL/PSM is what I took care of. But for a DBMS that we don't support, you'll have to connect to either MySQL/MariaDB or Tarantool and enter generic standard SQL statements, then change to your dialect's SQL statements after you have the Lua function.

Where is this code

I pushed the source code to github.com/ocelot-inc/ocelotgui. It is part of ocelotgui. It has to be part of ocelotgui because we need ocelotgui's recognizer. So anyone wanting to alpha-test would have to:
download the ocelotgui source
build as instructed in the README
start ocelotgui and connect
make sure syntax checking is on, ansi_quotes is off, oracle mode is off
enter a CREATE LUA PROCEDURE or CREATE LUA FUNCTION statement
look at the resulting dialog box
repeat.

The code of the new feature is mostly in file ocelotgui.cpp, function clf().

I want to know whether anyone agrees with the approach. That will affect whether I eventually move the feature to beta.
Comment on this post, or (if the comment period has expired) add a feature request on github/issues, or write to pgulutzan at-sign ocelot.ca.