Scaling Thumbnail Dependencies with a Rapidly Growing User Base

June

06, 2013

by Matt Dusek


Archive

cardboardROBLOX allows users to upload images in order to create unique avatar clothing and environments. If you’ve ever uploaded a custom asset, you’ve undoubtedly come across a thumbnail image that says “Review Pending.” That’s because every single image that is uploaded to our storage cloud must be moderated for inappropriate content. Users can’t see the thumbnail of the content until it’s been inspected and approved. In this blog article, we’ll not only walk you through the approval process, but our recently changed process of creating and tracking every thumbnail you see on ROBLOX.

Let’s first step back and define exactly how thumbnail generation works. Uploading a place, model, clothing or decal automatically generates a thumbnail on our website–every one of these thumbnails has a unique ID that is associated with a unique asset ID. This association indicates that the specified asset was used to create a thumbnail of the specified content. The association between the asset and the thumbnail is called a thumbnail dependency. There can be more than one dependency record per thumbnail.

Thumbnail-Dependency-Example

This illustrates that one thumbnail can have multiple dependencies. In this case, this avatar has two: a shirt and a hat.

Tracking character thumbnail dependencies has helped us understand how personalized content moves around on ROBLOX. Thumbnails are probably something you take for granted, but think about how big ROBLOX is. Now try and imagine how many thumbnails we generate, track in our database, and store on servers all around the globe (hint: it’s in the billions.)

An essential feature of our thumbnailing system is the ability to track every piece of content that was loaded to produce a given thumbnail. This allows us to invalidate and recreate the thumbnail of your work whenever you change any part of it. It’s also how we’re able to immediately remove the “Review Pending” thumbnail and replace it with the actual thumbnail the moment your uploaded content is approved.

Until recently, we stored the record of every single thumbnail dependency on a single SQL Server database. We have long known that, eventually, there would simply be too many dependencies to keep tracking them this way. We monitor the growth of all our resources closely, and we recently decided that it was time to upgrade the way we track our thumbnail dependencies, providing a solution that could grow with ROBLOX well into the future. When we reached this point, we were managing over 14 billion thumbnail dependency records.

reviewpendingFrequently, when a scaling challenge presents itself, we develop technology internally to solve the issue. Sometimes, however, the perfect tools are already available. In this particular instance, the technology to address the problem existed in the form of Amazon’s recently released web service, DynamoDB, coupled with its more established counterpart, the Simple Queuing Service. DynamoDB is more restrictive than our existing relational database, though it offers practically limitless data storage. Because it’s narrower in terms of how data can be represented, we modeled our data structures to be compatible. Then we slowly and methodically began to migrate all of our thumbnail dependencies data into DynamoDB.

We’re proud to report that we were able to initiate this process without negatively affecting a single user. This discrete process also gave us the opportunity to take a close look at the ways we distribute data throughout ROBLOX. To illustrate some of the changes we’ve made, let’s have a look at our data distribution method as it stands now.

Before, the web servers talked directly to the database servers, in order to move data into the web layer. When a piece of content changed (like when a character clothing asset was approved), the web server was responsible for all steps in coordinating the generation of a new thumbnail: communicating with the databases, requesting resources from our ROBLOX Compute Cloud, composing Lua scripts that are the recipe for producing a thumbnail, storing the results in our content cloud, tracking dependencies–we’ll stop here. The point is, it’s a lot of work. All of this was being done in addition to its core job of serving up webpages to your browser.

Now, we’ve moved those responsibilities into a dedicated “Thumbnails Service.” One of the components is a dedicated thumbnail dependencies processor that retrieves tasks from two queues–one populated with dependency information whenever a new thumbnail is generated, and the other populated with a signal that a piece of content has changed and thumbnails that depend on those changes need to be invalidated. So if an asset changes, the processor works to find all of the approved thumbnails that used the asset when they were created. Even if it’s an image that only a single shirt uses, there can be multiple thumbnails, because thumbnails come in many sizes and formats. The dependencies processor then informs the thumbnail service that the thumbnail needs to be re-created. This leaves the website entirely out of the process–the dependencies processor’s full-time job is to handle this computing. This approach is in keeping with the overall notion that we want to build and utilize small components that do one thing well, as opposed to big, monolithic applications that attempt to do everything.

Ultimately, moving away from our SQL Server solution and utilizing Dynamo DB, we’re saving work, making our systems easier to understand, and saving some serious cash to boot. We’ve developed a cheaper, more scalable solution–and we were able to do so without affecting anyone’s day-to-day ROBLOX experience. That’s like rebuilding a car while it’s moving 120 miles per hour. This new method ensures that we’ll be serving billions and billions of thumbnails for a long time to come.

elasticprovisioningThis opens up an exciting opportunity for the future as well: the idea of “elastic provisioning,” wherein we provision and decommission servers throughout each day based on utilization. When we hit a ton of traffic, we boost the amount of dedicated servers to compensate, and drop servers when things cool off. This is hardly a new idea, but it’s one that forward-thinking architectural changes make possible. We’ll be sure to let you know when these changes go into effect.

Ultimately, we learned a lot by looking at thumbnail dependencies; not just about our users, but about how we maintain massive quantities of data, and how we can be more efficient.