Lua Parsing and Code Generation Internals

Stack and Registers

Lua employs two stacks. The Callinfo stack tracks activation frames. There is the secondary stack L->stack that is an array of TValue objects. The Callinfo objects index into this array. Registers are basically slots in the L->stack array.

When a function is called - the stack is setup as follows:

stack
|            function reference
|  base->    parameter 1
|            ...
|            parameter n
|            local 1
|            ...
|            local n
|  top->
|
V

So top is just past the registers needed by the function. The number of registers is determined based on locals and temporaries.

The base of the stack is set to just past the function reference - i.e. on the first parameter or register. All register addressing is done as offset from base - so R(0) is at base+0 on the stack.

Drawing of Lua Stack

The figure shows how the stack is related to other Lua objects.

A description of the stack and registers from Mike Pall on Lua mailing list is reproduced below.

Sliding Register Window - by Mike Pall

Note: this is a reformatted version of a post on Lua mailing list (see MP6 link below).

The Lua 5 VM employs a sliding register window on top of a stack. Frames (named CallInfo aka ‘ci’ in the source) occupy different (overlapping) ranges on the stack. Successive frames are positioned exactly over the passed arguments (luaD_precall). The compiler ensures that there are no live variables after the arguments for a call. Return values need to be copied down (with truncate/extend) to the slot holding the function object (luaD_poscall). This is because the compiler has no idea how many values another function may return – only how many need to be stored.

Example:

function f2(arg1, arg2, ..., argN)
  local local1, local2, ...
  ...
  return ret1, ret2, ..., retO
end

function f1(arg1, arg2, ..., argM)
  local local1, local2, ...
  ...
  local ret1, ret2, ..., retP = f2(arg1, arg2, ..., argN)
  ...
end

Simplified stack diagram:

stack
|
| time: >>>> call >>>>>>>>>>>>>>>>>> call ~~~~~~~~~~~~~~~~~~ return >>>>>
|
|   ciX.func-> f1      f1      f1                                 f1
|   ciX.base-> arg1    arg1    arg1                               arg1
|              arg2    arg2    arg2                               arg2
|              ...     ...     ...                                ...
|              argM    argM    argM                               argM
|   ciX.topC->         local1  local1                             local11
|                      local2  local2                             local2
|                      local3  local3                             local3
|                      ...     ...                                ...
|                              f2      ciY.func-> f2      f2      ret1
|                              arg1    ciY.base-> arg1    arg1    ret2
|                              arg2               arg2    arg2    ...
|                              ...                ...     ...     retP
|                              argN               argN    argN
|   ciX.topL-> ------  ------  ------  ciY.topC-> local1  local1
|                                                 local2  local2
|                                                 ...     ...
|                                                         ret1
|                                                         ret2
|                                                         ...
|                                                         retO
|                                      ciY.topL-> ------  ------
V

Note that there is only a single ‘top’ for each frame:

For Lua functions the top (tagged topL in the diagram) is set to the base plus the maximum number of slots used. The compiler knows this and stores it in the function prototype. The top pointer is used only temporarily for handling variable length argument and return value lists.

For C functions the top (tagged topC in the diagram) is initially set to the base plus the number of passed arguments. C functions can access their part of the stack via Lua API calls which in turn change the stack top. C functions return an integer that indicates the number of return values relative to the stack top.

In reality things are a bit more complex due to overlapped locals, block scopes, varargs, coroutines and a few other things. But this should get you the basic idea.

Parsing and Code Generation

The parser and code generator are arguably the most complex piece in the whole of Lua. The parser is one-pass - and generates code as it parses. That is, there is no AST build phase. This is primarily for efficiency it seems. The parser uses data structures on the stack - there are no heap allocated structures. Where needed the C stack itself is used to build structures - for example, as the assignment statement is parsed, there is recursion, and a stack based structure is built that links to structures in the call stack.

The main object used by the parser is the struct expdesc:

typedef struct expdesc {
  expkind k;
  union {
    struct {  /* for indexed variables (VINDEXED) */
      short idx;  /* index (R/K) */
      lu_byte t;  /* table (register or upvalue) */
      lu_byte vt;  /* whether 't' is register (VLOCAL) or upvalue (VUPVAL) */
    } ind;
    int info;  /* for generic use */
    lua_Number nval;  /* for VKFLT */
    lua_Integer ival;    /* for VKINT */
  } u;
  int t;  /* patch list of 'exit when true' */
  int f;  /* patch list of 'exit when false' */
  int ravi_type; /* RAVI change: type of the expression if known, else LUA_TNONE */
} expdesc;

The code is somewhat hard to follow as the expdesc objects go through various states and are also reused when needed.

As the parser generates code while parsing it needs to go back and patch the generated instructions when it has more information. For example when a function call is parsed the parser assumes that only 1 value is expected to be returned - but later this is patched when more information is available. The most common example is when the register where the value will be stored (operand A) is not known - in this case the parser later on updates this operand in the instruction. I believe jump statements have similar mechanics - however I have not yet gone through the details of these instructions.

Handling of Stack during parsing

Functions have a register window on the stack. The stack is represented in LexState->dyd.actvar (Dyndata) structure (see llex.h). The register window of the function starts from LexState->dyd.actvar.arr[firstlocal].

The ‘active’ local variables of the function extend up to LexState->dyd.actvar.arr[nactvar-1]. Note that when parsing a local declaration statement the nactvar is adjusted at the end of the statement so that during parsing of the statement the nactvar covers locals up to the start of the statement. This means that local variables come into scope (become ‘active’) after the local statement ends. However, if the local statement defines a function then the variable becomes ‘active’ before the function body is parsed.

A tricky thing to note is that while nactvar is adjusted at the end of the statement - the ‘stack’ as represented by LexState->dyd.actvar.arr is extended to the required size as the local variables are created by new_localvar().

When a function is the topmost function being parsed, the registers between LexState->dyd.actvar.arr[nactvar] and LexState->dyd.actvar.arr[freereg-1] are used by the parser for evaluating expressions - i.e. these are part of the local registers available to the function

Note that function parameters are handled as locals.

Example of what all this mean. Let’s say we are parsing following chunk of code:

function testfunc()
  -- at this stage 'nactvar' is 0 (no active variables)
  -- 'firstlocal' is set to current top of the variables stack
  -- LexState->dyd.actvar.n (i.e. excluding registers used for expression evaluation)
  -- LexState->dyd.actvar.n = 0 at this stage
  local function tryme()
    -- Since we are inside the local statement and 'tryme' is a local variable,
    -- the LexState->dyd.actvar.n goes to 1. As this is a function definition
    -- the local variable declaration is deemed to end here, so 'nactvar' for testfunc()
    -- is gets set to 1 (making 'tryme' an active variable).
    -- A new FuncState is created for 'tryme' function.
    -- The new tryme() FunState has 'firstlocal' set to value of LexState->dyd.actvar.n, i.e., 1
    local i,j = 5,6
    -- After 'i' is parsed, LexState->dyd.actvar.n = 2, but 'nactvar' = 0 for tryme()
    -- After 'j' is parsed, LexState->dyd.actvar.n = 3, but 'nactvar' = 0 for tryme()
    -- Only after the full statement above is parsed, 'nactvar' for tryme() is set to '2'
    -- This is done by adjustlocalvar().
    return i,j
  end
  -- Here two things happen
  -- Firstly the FuncState for tryme() is popped so that
  -- FuncState for testfunc() is now at top
  -- As part of this popping, leaveblock() calls removevars()
  -- to adjust the LexState->dyd.actvar.n down to 1 where it was
  -- at before parsing the tryme() function body.
  local i, j = tryme()
  -- After 'i' is parsed, LexState->dyd.actvar.n = 2, but 'nactvar' = 1 still
  -- After 'j' is parsed, LexState->dyd.actvar.n = 3, but 'nactvar' = 1 still
  -- At the end of the statement 'nactvar' is set to 3.
  return i+j
end
-- As before the leaveblock() calls removevars() which resets
-- LexState->dyd.actvar.n to 0 (the value before testfunc() was parsed)

A rough debug trace of the above gives:

function testfunc()
  -- open_func -> fs->firstlocal set to 0 (ls->dyd->actvar.n), and fs->nactvar reset to 0
  local function tryme()
    -- new_localvar -> registering var tryme fs->f->locvars[0] at ls->dyd->actvar.arr[0]
    -- new_localvar -> ls->dyd->actvar.n set to 1
    -- adjustlocalvars -> set fs->nactvar to 1
    -- open_func -> fs->firstlocal set to 1 (ls->dyd->actvar.n), and fs->nactvar reset to 0
    -- adjustlocalvars -> set fs->nactvar to 0 (no parameters)
    local i,j = 5,6
    -- new_localvar -> registering var i fs->f->locvars[0] at ls->dyd->actvar.arr[1]
    -- new_localvar -> ls->dyd->actvar.n set to 2
    -- new_localvar -> registering var j fs->f->locvars[1] at ls->dyd->actvar.arr[2]
    -- new_localvar -> ls->dyd->actvar.n set to 3
    -- adjustlocalvars -> set fs->nactvar to 2
    return i,j
    -- removevars -> reset fs->nactvar to 0
  end
  local i, j = tryme()
  -- new_localvar -> registering var i fs->f->locvars[1] at ls->dyd->actvar.arr[1]
  -- new_localvar -> ls->dyd->actvar.n set to 2
  -- new_localvar -> registering var j fs->f->locvars[2] at ls->dyd->actvar.arr[2]
  -- new_localvar -> ls->dyd->actvar.n set to 3
  -- adjustlocalvars -> set fs->nactvar to 3
  return i+j
  -- removevars -> reset fs->nactvar to 0
end

Notes on Parser by Sven Olsen

“discharging” expressions

“discharging” takes an expression of arbitrary type, and converts it to one having particular properties.

the lowest-level discharge function is discharge2vars (), which converts an expression into one of the two “result” types; either a VNONRELOC or a VRELOCABLE.

if the variable in question is a VLOCAL, discharge2vars will simply change the stored type to VNONRELOC.

much of lcode.c assumes that the it will be working with discharged expressions. in particular, it assumes that if it encounters a VNONRELOC expression, and e->info < nactvar, then the register referenced is a local, and therefore shouldn’t be implicitly freed after use.

local variables

however, the relationship between nactvar and locals is actually somewhat more complex – as each local variable appearing in the code has a collection of data attached to it, data that’s being accumulated and changed as the lexer moves through the source.

fs->nlocvars stores the total number of named locals inside the function – recall that different local variables are allowed to overlap the same register, depending on which are in-scope at any particular time.

the list of locals that are active at any given time is stored in ls->dyd – a vector of stack references that grows or shrinks as locals enter or leave scope.

managing the lifetime of local variables involves several steps. first, new locals are declared using new_localvar. this sets their names and creates new references in dyd. soon thereafter, the parser is expected to call adjustlocalvar(ls,nvars), with nvars set to the number of new locals. adjustlocalvar increments fs->nactvar by nvars, and marks the startpc’s of all the locals.

note that neither new_localvar or adjustlocalvar ensures that anything is actually inside the registers being labeled as locals. failing to initialize said registers is an easy way to write memory access bugs (peter’s original table unpack patch includes one such).

after adjustlocalvar is called, luaK_exp2nextreg() will no longer place new data inside the local’s registers – as they’re no longer part of the temporary register stack.

when the time comes to deactivate locals, that’s done via removevars(tolevel). tolevel is assumed to contain nactvars as it existed prior to entering the previous block. thus, the number of locals to remove should simply be fs->nactvar-tolevel. removevars(tolevel) will decrement nactvars down to tolevel. it also shrinks the dyd vector, and marks the endpc’s of all the removed locals.

except in between new_localvar and adjustlocalvar calls, i believe that:

fs->ls->dyd->actvar.n - fs->firstlocal == fs->nactvar

temporary registers

freereg is used to manage the temporary register stack – registers between [fs->nactvars,fs->freereg) are assumed to belong to expressions currently being stored by the parser.

fs->freereg is incremented explicitly by calls to luaK_reserveregs, or implicitly, inside luaK_exp2nextreg. it’s decremented whenever a freereg(r) is called on a register in the temporary stack (i.e., a register for which r >= fs->nactvar).

the temporary register stack is cleared when leaveblock() is called, by setting fs->freereg=fs->nactvar. it’s also partially cleared in other places – for example, inside the evaluation of table constructors.

note that freereg just pops the top of the stack if r does not appear to be a local – thus it doesn’t necessarily, free r. one of the important sanity checks that you’ll get by enabling lua_assert() checks that the register being freed is also the top of the stack.

when writing parser patches, it’s your job to ensure that the registers that you’ve reserved are freed in an appropriate order.

when a VINDEXED expression is discharged, freereg() will be called on both the table and the index register. otherwise, freereg is only called from freeexp() – which gets triggered anytime an expression has been “used up”; typically, anytime it’s been transformed into another expression.

State Transitions

The state transitions for expdesc structure are as follows:

expkind Description State Transitions
VVOID This is used to indicate the lack of value - e.g. function call with no arguments, the rhs of local variable declaration, and empty table constructor None
VRELOCABLE This is used to indicate that the result from expression needs to be set to a register. The operation that created the expression is referenced by the u.info parameter which contains an offset into the code of the function that is being compiled So you can access this instruction by calling getcode(FuncState *, expdesc *) The operations that result in a VRELOCABLE object include OP_CLOSURE OP_NEWTABLE OP_GETUPVAL OP_GETTABUP OP_GETTABLE OP_NOT and code for binary and unary expressions that produce values (arithmetic operations, bitwise operations, concat, length). The associated code instruction has operand A unset (defaulted to 0) - this the VRELOCABLE expression must be later transitioned to VNONRELOC state when the register is set. In terms of transitions the following expression kinds convert to VRELOCABLE: VVARARG VUPVAL (OP_GETUPVAL VINDEXED (OP_GETTABUP or OP_GETTABLE And following expression states can result from a VRELOCABLE expression: VNONRELOC which means that the result register in the instruction operand A has been set.
VNONRELOC This state indicates that the output or result register has been set. The register is referenced in u.info parameter. Once set the register cannot be changed for this expression; subsequent operations involving this expression can refer to the register to obtain the result value. As for transitions, the VNONELOC state results from VRELOCABLE after a register is assigned to the operation referenced by VRELOCABLE. Also a VCALL expression transitions to VNONRELOC expression - u.info is set to the operand A in the call instruction. VLOCAL VNIL VTRUE VFALSE VK VKINT VKFLT and VJMP expressions transition to VNONRELOC.
VLOCAL This is used when referencing local variables. u.info is set to the local variable’s register. The VLOCAL expression may transition to VNONRELOC although this doesn’t change the u.info parameter.
VCALL This results from a function call. The OP_CALL instruction is referenced by u.info parameter and may be retrieved by calling getcode(FuncState *, expdesc *). The OP_CALL instruction gets changed to OP_TAILCALL if the function call expression is the value of a RETURN statement. The instructions operand C gets updated when it is known the number of expected results from the function call. In terms of transitions, the VCALL expression transitions to VNONRELOC When this happens the result register in VNONRELOC (u.info is set to the operand A in the OP_CALL instruction.
VINDEXED This expression represents a table access. The u.ind.t parameter is set to the register or upvalue? that holds the table, the u.ind.idx is set to the register or constant that is the key, and u.ind.vt is either VLOCAL or VUPVAL The VINDEXED expression transitions to VRELOCABLE When this happens the u.info is set to the offset of the code that contains the opcode OP_GETTABUP if u.ind.vt was VUPVAL or OP_GETTABLE if u.ind.vt was VLOCAL

Examples of Parsing

example 1

We investigate the simple code chunk below:

local i,j; j = i*j+i

The compiler allocates following local registers, constants and upvalues:

constants (0) for 0000007428FED950:
locals (2) for 0000007428FED950:
      0       i       2       5
      1       j       2       5
upvalues (1) for 0000007428FED950:
      0       _ENV    1       0

Some of the parse steps are highlighted below.

Reference to variable i which is located in register 0. The p here is the pointer address of expdesc object so you can see how the same object evolves:

{p=0000007428E1F170, k=VLOCAL, register=0}

Reference to variable j located in register 1:

{p=0000007428E1F078, k=VLOCAL, register=1}

Now the MUL operator is applied so we get following. Note that the previously VLOCAL expression for i is now VNONRELOC:

{p=0000007428E1F170, k=VNONRELOC, register=0} MUL {p=0000007428E1F078, k=VLOCAL, register=1}

Next code gets generated for the MUL operator and we can see that first expression is replaced by a VRELOCABLE expression. Note also that the MUL operator is encoded in the VRELOCABLE expression as instruction 1 which is decoded below:

{p=0000007428E1F170, k=VRELOCABLE, pc=1, instruction=(MUL A=0 B=0 C=1)}

Now a reference to i is again required:

{p=0000007428E1F078, k=VLOCAL, register=0}

And the ADD operator must be applied to the result of the MUL operator and above. Notice that a temporary register 2 has been allocated to hold the result of the MUL operator, and also notice that as a result the VRELOCABLE has now changed to VNONRELOC:

{p=0000007428E1F170, k=VNONRELOC, register=2} ADD {p=0000007428E1F078, k=VLOCAL, register=0}

Next the result of the ADD expression gets encoded similarly to MUL earlier. As this is a VRELOCABLE expression it will be later on assigned a result register:

{p=0000007428E1F170, k=VRELOCABLE, pc=2, instruction=(ADD A=0 B=2 C=0)}

Eventually above gets assigned a result register and becomes VNONRELOC (not shown here) - and so the final generated code looks like below:

main <(string):0,0> (4 instructions at 0000007428FED950)
0+ params, 3 slots, 1 upvalue, 2 locals, 0 constants, 0 functions
      1       [1]     LOADNIL         0 1
      2       [1]     MUL             2 0 1
      3       [1]     ADD             1 2 0
      4       [1]     RETURN          0 1