LLVM First Steps

Note that the discussion below is for LLVM 3.5.1.

Although there appears to be a lot of documentation in the LLVM site surprisingly some basic information is hard to find. The main source of guidance for creating a JIT is in the example toy language Kaleidoscope. But here too there are several versions - so you have to pick the right version that is compatible with the LLVM version you are using.

A Lua JITed function will execute in the context of Lua. So it needs to be able to access the lua_State and its various structures. So I wanted a sample that demonstrates passing a pointer to a structure and accessing it within the JITed function.

The initial test program I created is meant to be a “hello world” type test but covering the functionility described above. The test I want to run is:

// Creating a function that takes pointer to struct as argument
// The function gets value from one of the fields in the struct
// And returns it
// The equivalent C program is:
//
// extern int printf(const char *, ...);
//
// struct GCObject {
//   struct GCObject *next;
//   unsigned char a;
//   unsigned char b;
// };
//
// int testfunc(struct GCObject *obj) {
//   printf("value = %d\n", obj->a);
//   return obj->a;
// }

You can view the test program at test_llvm.cpp. It is also reproduced below.

I used the new MCJIT engine in my test. It seems that this engine compiles modules rather than individual functions - and once compiled a module cannot be modified. So in the Lua context we need to create a new module everytime we JIT compile a function - or alternatively we JIT compile a whole Lua source file including all its functions into a single module.

I found the blog post Using MCJIT with Kaleidoscope useful in understanding some finer points about using MCJIT.

The Lua GCObject structure in lobject.h we need is:

typedef struct RaviGCObject {
  struct RaviGCObject *next;
  unsigned char b1;
  unsigned char b2;
} RaviGCObject;

Our prototype for the JITted function:

typedef int (*myfunc_t)(RaviGCObject *);

Get global context - not sure what the impact is of sharing:

llvm::LLVMContext &context = llvm::getGlobalContext();

Module is the translation unit:

std::unique_ptr<llvm::Module> theModule =
  std::unique_ptr<llvm::Module>(new llvm::Module("ravi", context));
llvm::Module *module = theModule.get();
llvm::IRBuilder<> builder(context);

On Windows we get error saying incompatible object format Reading posts on mailining lists I found that the issue is that COEFF format is not supported and therefore we need to set -elf as the object format:

#ifdef _WIN32
  auto triple = llvm::sys::getProcessTriple();
  module->setTargetTriple(triple + "-elf");
#endif

create a GCObject structure as defined in lobject.h:

llvm::StructType *structType =
    llvm::StructType::create(context, "RaviGCObject");
llvm::PointerType *pstructType =
  llvm::PointerType::get(structType, 0); // pointer to RaviGCObject
std::vector<llvm::Type *> elements;
elements.push_back(pstructType);
elements.push_back(llvm::Type::getInt8Ty(context));
elements.push_back(llvm::Type::getInt8Ty(context));
structType->setBody(elements);
structType->dump();

Create printf declaration:

std::vector<llvm::Type *> args;
args.push_back(llvm::Type::getInt8PtrTy(context));
// accepts a char*, is vararg, and returns int
llvm::FunctionType *printfType =
    llvm::FunctionType::get(builder.getInt32Ty(), args, true);
llvm::Constant *printfFunc =
    module->getOrInsertFunction("printf", printfType);

Create the testfunc():

args.clear();
args.push_back(pstructType);
llvm::FunctionType *funcType =
  llvm::FunctionType::get(builder.getInt32Ty(), args, false);
llvm::Function *mainFunc = llvm::Function::Create(
  funcType, llvm::Function::ExternalLinkage, "testfunc", module);
llvm::BasicBlock *entry =
  llvm::BasicBlock::Create(context, "entrypoint", mainFunc);
builder.SetInsertPoint(entry);

The printf format string:

llvm::Value *formatStr = builder.CreateGlobalStringPtr("value = %d\n");

Get the first argument which is RaviGCObject *:

auto argiter = mainFunc->arg_begin();
llvm::Value *arg1 = argiter++;
arg1->setName("obj");

Now we need a GEP for the second field in RaviGCObject:

std::vector<llvm::Value *> values;
llvm::APInt zero(32, 0);
llvm::APInt one(32, 1);
// This is the array offset into RaviGCObject*
values.push_back(
   llvm::Constant::getIntegerValue(llvm::Type::getInt32Ty(context), zero));
// This is the field offset
values.push_back(
  llvm::Constant::getIntegerValue(llvm::Type::getInt32Ty(context), one));

Create the GEP value:

llvm::Value *arg1_a = builder.CreateGEP(arg1, values, "ptr");

Now retrieve the data from the pointer address:

llvm::Value *tmp1 = builder.CreateLoad(arg1_a, "a");

As the retrieved value is a byte - convert to int i:

llvm::Value *tmp2 =
  builder.CreateZExt(tmp1, llvm::Type::getInt32Ty(context), "i");

Call the printf function:

values.clear();
values.push_back(formatStr);
values.push_back(tmp2);
builder.CreateCall(printfFunc, values);

return i:

builder.CreateRet(tmp2);
module->dump();

Lets create the MCJIT engine:

std::string errStr;
auto engine = llvm::EngineBuilder(module)
                .setErrorStr(&errStr)
                .setEngineKind(llvm::EngineKind::JIT)
                .setUseMCJIT(true)
                .create();
if (!engine) {
  llvm::errs() << "Failed to construct MCJIT ExecutionEngine: " << errStr
             << "\n";
  return 1;
}

Now lets compile our function into machine code:

std::string funcname = "testfunc";
myfunc_t funcptr = (myfunc_t)engine->getFunctionAddress(funcname);
if (funcptr == nullptr) {
  llvm::errs() << "Failed to obtain compiled function\n";
  return 1;
}

Run the function and test results:

RaviGCObject obj = {NULL, 42, 65};
int ans = funcptr(&obj);
printf("The answer is %d\n", ans);
return ans == 42 ? 0 : 1;

Accessing extern functions from JIT compiled code

The JITed function needs to access extern Lua functions. We need a way to map these to make these visible to the JITed code. Simply declaring the functions extern only appears to work if the functios are available as exported symbols in dynamic libraries, e.g. the call to printf above.

From reading posts on the subject it appears that the way to do this is to add a global mapping in the ExecutionEngine by calling the addGlobalMapping() method. However this doesn’t work with MCJIT due to a bug! So we need to use a workaround. Apparently there are two solutions:

  • Create a custom memory manager that resolves the extern functions.
  • Add the symbol to the global symbols by calling llvm::sys::DynamicLibrary::AddSymbol().

I am using the latter approach for now.

Memory Management in LLVM

Curiously LLVM docs do not say much about how memory should be managed. I am still trying to figure this out, but in general it seems that there is hierarchy of ownership. Example: ExecutionEngine owns the Module. By deleting the parent the ‘owned’ objects are automatically deleted.