A little over a year ago, we took an in-depth look at what happens every time you hit the “Play” button. Fact is, a lot goes on behind the scenes when you start a new instance of ROBLOX. That being said, we’ve made several changes over the past couple of months in the constant and never-ending quest to make getting in and out of games as seamless and smooth as possible. We tapped ROBLOX Web Engineer Vlad Fridman to explain the dramatic infrastructural changes that have taken place in the last couple of months, and tell us where they’ll take us in the near future.
At any given moment, thousands of ROBLOX players are joining games. Hundreds of new game instances are starting while hundreds are being shut down. Put simply: ROBLOX is a massive system, and making that system work both quickly and reliably is an ongoing challenge. With over 80,000 concurrent players in thousands of running games (and many more on the website and in Studio), tracking the state of every player/game instance becomes extremely important. We track the state of every game instance on ROBLOX and use performance metrics we collect to help us create the greatest cloud-gaming experience possible.
Originally, we used an SQL database to track information. Ultimately, we realized that this method had bugs and presented challenges when scaling to our growing audience. Rather than repeatedly patch the system, we opted to start anew. We created a custom matchmaking system, and coded it in a way that would allow it to work side-by-side with the old implementation. When testing our new matchmaking system, we switched back and forth four times on production–without anyone noticing a thing!
Rewriting our database
We learned that our relational SQL database wasn’t the best way to monitor and track every running game on ROBLOX, so we wrote our own in-memory datastores to supplement our matchmaking system. Our new data stores are optimized for performance and scalability. Allow me to explain in a general sense what a datastore is. There are three different datastores in ROBLOX: Player State, Game State and Server State stores. Each of these datastores are “shardable,” meaning we can split the work they are doing across multiple servers; each one allocated with a portion of the work. Using sharding, we can scale the system infinitely–far from the case with our SQL database, which was limited by the available power of a single machine. Let’s go over each of the types of datastores we use.
Player State Datastore
The Player State Datastore is simultaneously the busiest and simplest component of the system. The Player State Datastore is responsible for tracking user location in-game–its primary function is to know, at all times, where any given player is. That being said, the Player State Datastore handles 8,000 requests a second. If a user joins “Space Knights” from an iPad, then joins “Apocalypse Rising” from a laptop with the same account, he or she will be kicked out of “Space Knights.” That’s a small example of one of the thousands of tasks the Player State Datastore is constantly performing.
Game State Datastore
The Game State Datastore keeps track of all instances of running games, all logged-in users and guests inside each game, and also manages reservation spots. What’s a reservation spot?
Each and every time one of our gamers hits the “Play” button, the Game State Datastore checks to see if there are any running instances of the game that the user wants to play. If there isn’t (as is often the case; most games are usually full), a new “pending” game is created in the Datastore, and a spot is held for the user in that particular game instance. Then, the game server starts, and you are connected to a brand new game instance.
If there are already games with open slots running, the Game State Datastore will try to find games with the least amount of open slots, meaning you’ll usually be dropped into a game that is almost full–we assume the fuller the game, the better. We’re looking at ways to drop you into games by finding servers with the lowest latency to you–stay tuned, we’ll be talking more about this feature in a coming article.
The Game State Datastore can handle a wide range of complicated scenarios. Say a party of 10 people wants to join a game that is running off of two servers, neither of which have enough open slots. The Game State Datastore would then launch a new game instance and reserve 10 spots in it immediately. What if someone decides they want to play something else in the middle of this process? The Game State Datastore knows to immediately clear the reservation spot in the game and give it to another person waiting to play. This all happens extremely quickly. We’ve even got a contingency plan in the unlikely scenario that the Game State Datastore crashes–every game server “calls home” at least once every 30 seconds, and also each and every time a player joins or leaves any particular game. So if we lose data due to a crash, we can recreate it almost instantly from the data our game servers send back during the “call.”
Server State Datastore
The Server State Datastore tracks performance metrics for all available game servers–its job is to decide which game server should launch a new game instance when there are no available slots for players to join. To do this, the Server State Datastore has to take into account various performance characteristics of each server: available CPU capacity, available RAM, and network bandwidth availability all factor in on the server choice. Once these things are determined, the Server State Datastore chooses the best available server to launch a new game.
This method of allocation based on performance metrics ensures that our game servers are used in the most efficient way, and that no single game server is overloaded. The result of this complex process? More games with less lag, inhabiting less servers. That’s a win for everybody.
This is different than our previous method in several key ways. Before, a centralized server would talk to game servers and tell them which games to start. With our new method, the responsibility is reversed. The game servers ask what to do instead of being told. This makes the process much faster and dramatically simplifies the architecture of the overall system.
All of these changes have taken place over the last couple of months. The fact that you’ve noticed nothing is actually the best news we could report. We want ROBLOX to become faster, smoother, and more efficient, and we’re constantly working on the behind-the-scenes tech that enables such improvements. Stay tuned to the blog to learn about another technique we’re developing to get you into responsive servers quickly!