Condense and Compress: Our Custom Binary File Format

May

23, 2013

by zeuxcg


Archive

Binary CityThink about the performance improvements and rendering features we’ve brought to ROBLOX in the last year. Among them are featherweight parts, fast clusters, efficient collision detection and dynamic lighting. These improvements have crushed the limits – on part count, physics simulation and aesthetic flexibility – builders once encountered and opened doors to a world of possibilities. They’ve also exposed new limits as builders have pushed the boundaries further than ever. In this article, we’ll explain how we’ve achieved an approximately 100x reduction in the file size of ROBLOX places and a 5-10x reduction in load/save time.

Think about featherweight parts. The feature, which now applies to all ROBLOX parts (as opposed to only select shapes and materials), allows you to create a ROBLOX place with tens of thousands of bricks, and have it run on both desktop and mobile devices. However, a place with tens of thousands of parts can result in a very large file. Our test place, which we’ll reference throughout this article, has 50,000 parts and weighs in at a whopping 230 megabytes. That’s big enough to affect builders in multiple ways: rendering, saving/loading, publishing, loading the place from a server and more.

We decided the time was right to address the file size of ROBLOX creations – both to improve the user experience and support the development of a yet-unannounced project. To that end, we are soon going to start testing a new file format, which saves ROBLOX data in binary format rather than XML. The change will be transparent, but over time reduce the time builders spend waiting. We will roll out the new method of saving ROBLOX data slowly, starting with the temporary files created for “Play Solo” and “Start Server” sessions in ROBLOX Studio, and eventually bring it to Personal Build Server storage, place uploads and local saves.

XML IconOut with the old

File size matters. It affects the time it takes to save and load a file locally, and publish to and load content from ROBLOX.com. Additionally, there is a file-size restriction on what can be published to ROBLOX.com (builders have found workarounds by writing scripts that generate additional parts after a game has loaded).

ROBLOX has long saved place data in XML format. It’s an intuitive, hierarchy-based format, where each object has properties, and each property has a value, and everything fits together in a nice tree of data. However, it doesn’t scale infinitely, primarily because it takes time to convert memory-readable data to a string (e.g., upon save) and there is significant redundancy. Two parts that are nearly identical are stored entirely as individual chunks of XML. Saved as XML, our test place with 50,000 parts and a file size of 230 megabytes takes 23 seconds to save and 37 seconds to load. While there are benefits, such as being human-readable (useful for debugging purposes) and forward/backward-compatible (useful in an environment where ROBLOX Studio and the ROBLOX Player are not always in version-parity), we’re growing out of the format.

We considered improving our XML parser, but knew the benefits would be limited. We decided to go big by overhauling the format in which we save ROBLOX data.

In with the new

Binary Pull Quote

One of the alternatives to the text-heavy XML format is binary. There are stark contrasts between the two.

  • Binary is not human-readable. However, binary data closely resembles its in-memory representation, meaning it’s easy for us to get an object that’s in memory and write it as a stream of bytes (and vice versa).
  • Rather than a hierarchy of objects, each containing all 30-odd properties and their values, ROBLOX data is stored in groups (i.e., for each property, we save the value for all objects in the place) to reduce repetition. This also makes it easier to skip loading deprecated data, which might occur if we remove a part property sometime in the future. We still get the benefit of forward and backward compatibility.

By saving a ROBLOX creation in binary format, we eliminate redundant references to properties (which cannot be added/removed by users and are often static) and reduce file sizes by roughly 10x. It’s not as simple as pushing some magical switch, though; we created a custom method of saving property values (integers) in a way that is both fast and compressible.

Converting property values for speed and compression

It starts with breaking the property value integers down to four bytes. The maximum value for each byte is 2^8-1 (or 255). Seven looks like this:

0 0 0 7

And a value of 258 looks like this:

0 0 1 2

Rather than store each value by row, our code stores values by column. Because most property values in ROBLOX are small, we end up with many zeros stacking. This sort of redundancy is great for compression.

For example, let’s say we want to write the following integers to the file: 1, 5, 4, 258. A naïve approach would write their bytes consecutively, like this:

0 0 0 1 0 0 0 5 0 0 0 4 0 0 1 2

But in our re-order by column, they get written as:

0 0 0 0 0 0 0 0 0 0 0 1 1 5 4 2

This compresses much more efficiently as so many zeros are repeated.

Negative numbers throw a wrench into this technique. They’re not terribly common, but they can arise in ROBLOX data when, for example, a negative number is used to offset a GUI element. After applying the mathematical operation known as two’s complement to reverse the sign (2^32 + (-7)), negative seven looks like this:

255 255 255 249

All these non-zero numbers interfere with our goal: a highly compressible byte stream of zeros. We engineered a second trick to translate negative integers into small positive integers. Anytime a value is greater than or equal to 0, we run it through this formula:

2 x n

Anytime a number is less than 0, we treat it as:

2 x |n| - 1

This means a value of 7 is represented as 14 (2 x 7). A value of -7 is represented as 13 (2 x |7| – 1). When we stack these numbers, the zeros accumulate and we have a very compressible file.

0 0 0 14
0 0 0 13

All of these conversions and calculations happen at very little cost. For that reason, it’s much more efficient to handle this work on the front end than rely entirely on a compression algorithm.

After breaking the property value integers down to bytes, we automatically run the data through LZ4 compression, which is a lossless algorithm that quickly compresses the binary data at very little cost. This reduces the file size even further. Previously, there was no compression at this stage of the file-storage process – it only happened via Gzip’s automatic compression when a creation was transferred via HTTP to ROBLOX.com.

Testing the new format

With the binary file format and LZ4 compression, the 230 megabyte ROBLOX place is reduced to less than 1 megabyte. After Gzip’s compression, the file is around 100 kilobytes. Here are results of a couple tests we ran internally:

Test 1
We built a test place with 50,000 parts, 150,000 ManualWeld joints and 8 million voxels.

XML Binary
Size 230 mb 600 kb
Size (after Gzip) 3.9 mb 93 kb
Load time 27 seconds 3.7 seconds
Save time 20 seconds 0.5 seconds

Test 2
We tested Welcome to the Neighborhood of ROBLOXia, an actual ROBLOX level built with 40,000 parts. The level does not include many joints, which explains the difference in numbers compared to the test place. Many intrepid builders have stripped joints from their levels via script to reduce file size, but at the loss of physical simulation. This should simplify life for them.

XML Binary
Size 112 mb 1 mb
Size (after Gzip) 3.5 mb 700 kb
Load time 15 seconds 2 seconds
Save time 9 seconds 0.2 seconds

Note: load/save times are from the server; Studio save time is the same, Studio load time is slightly longer since it does some extra work to set up a system that handles undo/redo.

Binary-Banner

In both tests, there are improvements on the order of a 100x reduction in file size and 5-10x reduction in load/save time. As stated at the beginning of this article, we will roll out the binary file format slowly, starting with temporary files that are created when you start a play session via ROBLOX Studio using “Play Solo” or “Start Server”. We have done due diligence to ensure this won’t corrupt your data, but we will be seeking feedback if you notice any odd behavior (e.g., your parts are not appearing as they should or, more unlikely, you cannot open a file).

We will eventually begin storing data in binary format for Personal Build Server and local saves. At that time, you should start to see a greater benefit. Our hope is you’ll simply ease into a smoother build, save and publish process, and enjoy a subtle, but significant increase in the quality of the ROBLOX experience.