Featherweight Parts: One Million Parts, One ROBLOX World

December

21, 2012

by Gemlocker


Archive

FeatherFeatherweight parts, despite their name, have nothing to do with feathers, flying or anything bird-related. They do, however, have a lot to do with weight: while old ROBLOX parts are “heavy” memory users, featherweight parts are “super light,” allowing us to load and render more of them with the same computing power. The first iteration of featherweight parts has now released for ROBLOX on all platforms and it has already proven revolutionary in some ROBLOX places.

The following table shows the number of milliseconds required to render one frame. In a future release, these numbers will improve even further as we featherweight more materials, surfaces, and primitives.

Place name No featherweight parts With featherweight parts
Block Town 80 ms 16 ms
ROBLOX Battle 17 ms 6 ms
DriveBlox Unlimited 67 ms 46 ms

Featherweight Technology

We’ve talked about featherweight parts before, but only in passing – there have been rumblings of a “500,000 parts update” and we showcased them in front of a small fraction of users at this year’s ROBLOX Game Conference. The initial vision for the project was to allow ROBLOX builders to work with a vastly increased number of parts per world (i.e., to go from 40,000-50,000 parts running at a stable frame rate to 1 million). That’s still our vision and we’re well on our way. But the first iteration of featherweight parts allows us bring 10,000+ part levels, which would have previously only run well on the desktop, to the hardware-constrained mobile environment.

If you have an iPad, we recommend you try the featherweight version of Block Town, a sprawling city built by members of the Elite Builders of ROBLOXia that runs smooth as butter. It simply wouldn’t perform on mobile without featherweight parts.

The featherweight parts project is big: to accomplish our vision, we need to optimize memory usage, make rendering more scalable, manage the physical simulation of more parts, get users into games quickly and control the file sizes of saved levels. We started with optimizations to memory usage and rendering code.

Memory optimization: dictionaries and signals

Before getting into the technical details, it’s worth noting that we’ve reduced the amount of memory used by a standard brick from about 15 kilobytes to 3.5 kilobytes (3kb for physics, 0.5kb for memory/rendering). The decrease, in part, stemmed from the observation that a lot of parts in the average ROBLOX world are substantially similar.

Rather than store every piece of data for each similar part individually, with featherweight parts we group shared data in a sort of “dictionary.” For instance, if you have 100 of the same blue, non-moving bricks, we’ll store all the information they share – material, color, transparency, reflectance, AlphaModifier, locked and name – in one chunk of memory. Each part draws on that piece of memory for much of its information and carries its own coordinate frame and unique ID.

Old Part Storage

The old method of storing parts in memory. Each part stores its properties individually.
New Part Storage
The new method of storing parts in memory. Parts with the same combination of properties use less memory by storing shared properties once in a “dictionary.”

As long as there are two or more parts with the same combination of the seven required properties, we’ll create the dictionary entry for them and reduce the amount of memory they’d otherwise use.

When a group of featherweight parts is destroyed by an explosion (or some other environmental change that causes them to move), they go back to being individuals in memory. You might expect this to decrease performance because that’s a lot of information quickly shuttling around objects, but our experiments show that expanding and shrinking dictionary entries does not significantly stress the engine. There’s a less than 10% drop in performance even in chaotic situations.

If you look at a ROBLOX part and its other memory-consuming elements, you’ll see that some elements are only used upon being triggered. In ROBLOX, a common example is signals. Signals represent different events a part can have in its life. For example, you can use a signal to have a part spawn an effect upon being touched by a player.

In the past players were always “subscribed” to each part’s signal, meaning they were always paying the cost of the memory associated with it (even when it wasn’t active). We redesigned this structure so a part’s signals do not use any memory until you subscribe to them for the first time.

Rendering: optimized batching

Block Town Featherweight Screenshot

The featherweight version of Block Town

For basic rendering, your CPU passes commands to your GPU (video card) – it says, “draw this, then draw this, then draw this,” and so forth. The CPU’s ability to pass commands often becomes the rendering bottleneck, especially on lower-end devices like the iPad. 3D worlds are generally rendered in “batches” to reduce the total number of commands and avoid the bottleneck. We’ve been doing this for a while, but when we get to 50,000 parts, there are so many draw commands that ROBLOX slows down even on good hardware. This makes increased part counts a big problem.

We have always been conservative about batching parts because we run a comprehensive physics simulation, meaning any part in a larger object can move at any moment. In those cases, we have to break apart the batch and render the parts individually, which is an expensive process. We’ve observed, however, that static parts – the stuff that makes up environments, for example – don’t change often. With featherweight parts we’ve moved toward batching parts into larger groups. For example, an object consisting of 1,000 parts can be rendered with one command to the GPU.

If any one of those 1,000 parts does happen to move, the object separates into smaller parts. This used to be a slow process with a significant performance spike (read more about spikes here), but we’re rewriting our rendering code and have already made the process about 10x faster. That’s partly because the code is more efficient, but also because we’re not supporting all parts (e.g., characters, cylinders, spheres) or all rendering techniques (e.g., rounded edges via bevels). We’re focusing the improvements on high bang-for-buck instances.

Coming soon: texture optimization

Another way we’re improving rendering is making material textures more efficient. Essentially, we’ve created a texture “atlas” that stores multiple textures in one place. So, rather than issuing four commands to draw two bricks with different textures:

Select wood texture > draw brick 1 > select brick texture > draw brick 2

We can do the same thing in three commands:

Select texture atlas > draw brick 1 > draw brick 2

If you’re using five textures, for example, you start to see a significant improvement in rendering time. This functionality is not included in the initial release of featherweight parts – as stated above, the initial release is laser focused on improving performance in very narrow, specific ways – but it is coming soon.

Harnessing Featherweight Parts

If you haven’t already gathered this, there are caveats for leveraging featherweight parts today. We’re writing brand-new code and it’ll take some time before it supports every ROBLOX feature. Featherweight parts currently support:

  • All materials except Corroded Metal
  • Flat, Stud, and Inlet surfaces
  • Anchored parts (or parts welded to anchored)
  • All materials when a user’s graphics settings are at low (i.e., they’re using low-end hardware)

For the future, we’re planning to make every “sleeping” part featherweight, too. It doesn’t have to be anchored or welded to something anchored, it just has to be still for, say, four seconds. There’s a cost to transitioning between awake and sleeping, which slows rendering and can cause flickering, so we have to optimize this process before implementing the change.

What’s really exciting is adding featherweight efficiency to moving parts. Senior Rendering Engineer Arseny Kapoulkine is working on a “flex cluster” project that will allow us to transition parts between static featherweight and moving featherweight without relying on any old code. That’s when you’ll see the full benefit of featherweight parts, and you can be sure we’ll tell you about it.