Bye Bye Bytecode

August

03, 2012

by John Shedletsky


Archive

Bytecode

If you are developing a scripting language, it turns out that allowing clients to load and execute arbitrary bytecode is a really bad idea. Lua, the embedded scripting language that ROBLOX uses, unfortunately allows this by default. After some deliberation, we decided to remove this capability from ROBLOX Lua.

What is bytecode?

Bytecode a set of machine-readable instructions. Programming languages like Lua are high-level abstractions that make it easier for humans to write code. Before code can be executed by a computer or a virtual machine (VM), it needs to be translated to low-level instructions – the most very fundamental operations that the CPU or VM supports.

A simple code snippet like this:

print("Hello World!")

Becomes this in bytecode (transformed into a standard hex dump for readability):

00000000 1B 4C 75 61 51 00 01 04 04 04 08 00 12 00 00 00 .LuaQ………..
00000010 3D 57 6F 72 6B 73 70 61 63 65 2E 53 63 72 69 70 =Workspace.Scrip
00000020 74 00 01 00 00 00 03 00 00 00 00 00 00 02 04 00 t……………
00000030 00 00 05 00 00 00 41 40 00 00 1C 40 00 01 1E 00 ……A@…@….
00000040 80 00 02 00 00 00 04 06 00 00 00 70 72 69 6E 74 €……….print
00000050 00 04 0C 00 00 00 48 65 6C 6C 6F 20 57 6F 72 6C ……Hello Worl
00000060 64 00 00 00 00 00 04 00 00 00 02 00 00 00 02 00 d……………
00000070 00 00 02 00 00 00 03 00 00 00 00 00 00 00 00 00 …………….
00000080 00 00 ..

In the standard Lua library, there is a function called loadstring that you can use to convert a string into a function. Going forward, this will still work:

fn = loadstring("print('Hello World!')")
fn()

However, if you prefix your argument to loadstring with ASCII character 27, it can load bytecode too:

fn = loadstring('\27\76\117\97\81\0\1\4\8\4\8\0\47\0\0\0\0\0\0\0\114\101\116
\117\114\110\32\102\117\110\99\116\105\111\110\40\41\32\10\112\114\105\110
\116\40\34\72\101\108\108\111\32\87\111\114\108\100\33\34\41\10\10\32
\101\110\100\0\1\0\0\0\4\0\0\0\0\0\0\2\4\0\0\0\5\0\0\0\65\64\0\0\28\64\0
\1\30\0\128\0\2\0\0\0\4\6\0\0\0\0\0\0\0\112\114\105\110\116\0\4\13\0
\0\0\0\0\0\0\72\101\108\108\111\32\87\111\114\108\100\33\0\0\0\0\0\4\0\0\0
\2\0\0\0\2\0\0\0\2\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0')
fn()

In both cases calling fn() will print “Hello World!”

Last night ROBLOX shipped a release that prevents the standard Lua library function loadstring from loading bytecode. This function was used exclusively by some of our advanced scripters and breaks at most a couple hundred places that were using loadstring to do sinful (and sometimes very clever) things. So why did we do it? Two reasons.

#1. Security – loadstring(bytecode) is impossible to sandbox

ROBLOX sandboxes the Lua execution environment into multiple security contexts. We do this to create functions that only ROBLOX-authored scripts can call, or that only game servers can call. There’s a lot of functions, like loadCharacter(userId) or httpGet(url) that are very useful, but that we don’t want regular users to have access to. For example, loadCharacter would allow widespread identity spoofing and httpGet would allow someone to DDoS ROBLOX.com using our own game servers.

Unfortunately, the version of Lua that ROBLOX uses (5.1) has several virtual machine-level defects that can be exploited with the use of bytecode. ROBLOX user Necrobumpist found this article that discusses some of them. The nastiest one allows you to hijack the stack of another function running in another security context. In short, if you know what you are doing, you can craft a privilege elevation attack using bytecode.

The Lua community at one point was working on a bytecode validator for loadstring to prevent these sorts of shenanigans, but it was eventually deemed infeasible and it is now the responsibility of developers embedding Lua to filter out bad bytecode themselves (presumably by disabling it).

#2. Version Specific – loadstring(bytecode) is not future proof

ROBLOX tries very very hard not to break any of the tens of millions of user scripts that exist in our ecosystem whenever we put out a new release, because we respect the time and effort that coders have spent making those scripts.

The ability of loadstring to ingest bytecode seriously compromises our ability to change anything relating to the underlying Lua internals without breaking client scripts. We could not, for instance, update to a more recent version of Lua than 5.1 or experiment with making our Lua script execution 2-10x faster by implementing LuaJIT. There are very good reasons for us to want to be able to do both of these things.

How does this affect me?

It probably doesn’t affect you. We are aware that a small number of ROBLOX levels are using loadstring(bytecode) as an obfuscation method to protect their source code from theft or modification. These levels won’t function correctly until their creators update their code. If you are interested in learning more about low-level Lua hacks, the ROBLOX scripters forum is a great place to start.