The post How Roblox Helps Developers Create, Scale, and Monetize appeared first on Roblox Blog.
]]>We want to empower any developer or creator to make anything anywhere on our platform, and at this week’s Game Developers Conference (GDC) in San Francisco, we showcased many of the ways we help people create, scale, and monetize.
We unveiled two new AI tools that can significantly speed up creating on Roblox. In addition, we announced the evolution of our Creator Fund (previously known as Game Fund). As before, it focuses on funding next-level experiences with innovative gameplay, ambitious visual designs, and original ideas. And we’re also excited to expand the types of content in the program. That means we’ll be bringing beloved off-platform IP to Roblox, including Paramount’s iconic Avatar: The Last Airbender, as well as content beyond games, like Neura Studios’ brand-new release, Clip It.
Roblox and members of our creator community also hosted sessions on topics ranging from doing business on Roblox to generative AI to the differences between free-to-play mobile creation and creating on Roblox. We want to thank the Roblox creators who shared their expertise onstage: @badcc, @Andybloxstar, @Erythia, @Coffeenerd, @CDDevelopment, @EncodedLua, @Toya_Studio, and more.
Here’s a deeper look at everything we presented this week that we’re confident will help enable more creators than ever to achieve their goals on Roblox.
One of the most important things we do for Roblox creators is develop tools and technologies that make creating 3D content easier and faster. So this week at GDC, we announced two new AI tools — Avatar Auto Setup and Texture Generator — that do just that.
With Avatar Auto Setup, it will be simpler than ever to create an avatar by quickly and automatically converting a 3D model into a fully animated avatar that people can use right away. This industry-leading tool also provides the ability to add facial animation to avatars and can cut the time it takes to set up an avatar from days to minutes.
With Texture Generator, creators will be able to use text prompts to quickly change and customize how 3D objects look. The textures the tool produces automatically conform to the shape of objects, significantly reducing the work required to bring an object to life.
These are the latest examples of tools that can help creators proceed faster from an idea to reality in Roblox Studio. This free, advanced 3D development software makes it simple for almost anyone to create anything they can imagine on Roblox. Built on our platform’s multiplayer, real-world simulation engine, it provides creators with out-of-the-box access to advanced physics, our growing suite of innovative AI solutions, aerodynamics, and so much more.
At GDC, we delved into how creators on Roblox do best when they have access to transparent data showing how their experiences are performing and where there’s room to grow. We’ve been building out our robust analytics suite, which gives creators actionable performance insights, allowing them to adjust their content strategies to reflect how their experiences are doing. And they can rapidly iterate based on those insights by publishing updates in seconds, anytime they want, to mobile, desktop, console, and VR simultaneously.
We also explored our principles and plans for Discovery on Roblox, which helps connect creators with their ideal audiences and encourages them to continue improving their creations. At scale, that provides a greater diversity of terrific content for our users, connecting them with the creations and communities that best match their interests.
Roblox features an algorithm that continuously refreshes the creations and updates users see. It’s part of a healthy discovery system built on network effects, engagement, and monetization that allows creators to maximize their reach with our global audience.
One great example is Gunflight Studio’s Gunfight Arena, which Roblox users have visited more than 228 million times since it launched last October. The team iterated on it quickly, and was able to frequently test its changes to achieve their business goals and to improve its discoverability. And by optimizing for performance, they made Gunfight Arena the most popular first-person shooter on Roblox.
Creators of all sizes, from individuals to large studios, are demonstrating the many ways anyone can create and monetize their work on Roblox. In 2023, our more than 25 million creators collectively earned $741 million, up 19 percent year over year, and we’re always looking to give them more opportunities to succeed. When they do, our entire ecosystem does too. So our goal is to empower all creators with a wide range of ways to earn money on Roblox, including in-experience purchases, immersive ads, and selling avatar items or creator plugins.
We want to thank everyone who came out to see us at GDC. To learn more about what Roblox offers creators, please visit our Creator Hub page. We’re excited to see how our creator community grows and thrives in the months and years to come.
The post How Roblox Helps Developers Create, Scale, and Monetize appeared first on Roblox Blog.
]]>The post Roblox ML Engineer Xiao Yu Receives Test of Time Award appeared first on Roblox Blog.
]]>The winning paper, “Personalized Entity Recommendation: A Heterogeneous Information Network Approach” was first presented at WSDM 2014, while Yu was a researcher at the University of Illinois at Urbana-Champaign. Yu joined Roblox in 2022 and has worked on natural language, computer vision, large language models, and Generative AI, including our recent work on real-time AI chat translation and real-time voice moderation.
Yu says the award-winning paper “introduces the concept of meta-path-based latent features as the representations for users and items. This was before representation learning became state-of-the-art for recommender systems. Though it predates the widespread use of embeddings in heterogeneous networks and recommender systems, the observations and philosophy presented in this paper inspired many researchers to reexamine this problem and sparked a wave of innovative research in this domain.”
The research published by Yu and colleagues has gained significant recognition over the past decade as recommendation engines have become increasingly ubiquitous. “By incorporating diverse relationship information, our method personalizes recommendations to a greater extent, leading to more accurate, relevant, and customized suggestions for users. This is crucial in today’s information overload scenario, where people are bombarded with irrelevant recommendations,” Yu says.
“Prior to this paper, graph-based hybrid recommender systems often utilized a single type of relationship, like whether a user had purchased a certain item before. This was one of the first approaches to leverage the relationship heterogeneity within a network. By modeling various relationships, the proposed recommender system can capture a richer and more nuanced understanding of user preferences and item characteristics.”
Learn about recent AI research at Roblox here.
The post Roblox ML Engineer Xiao Yu Receives Test of Time Award appeared first on Roblox Blog.
]]>The post Breaking Down Language Barriers with a Multilingual Translation Model appeared first on Roblox Blog.
]]>In any experience that has enabled our in-experience text chat service, people from different countries can now be understood by people who don’t speak their language. The chat window will automatically show Korean translated into English, or Turkish translated into German, and vice versa, so that each person sees the conversation in their own tongue. These translations are displayed in real time, with latency of approximately 100 milliseconds, so the translation happening behind the scenes is nearly invisible. Using AI to automate real-time translations in text chat removes language barriers and brings more people together, no matter where they live in the world.
AI translation is not new, the majority of our in-experience content is already automatically translated. We wanted to go beyond translating static content in experiences. We wanted to automatically translate interactions — and we wanted to do that for all 16 languages we support on the platform. This was an audacious goal for two reasons: First, we weren’t just translating from one primary language (i.e., English) to another, we wanted a system capable of translating between any combination of the 16 languages we support. Second, it had to be fast. Fast enough to support real chat conversations, which to us meant getting latency down to approximately 100 milliseconds.
Roblox is home to more than 70 million daily active users all over the world and growing. People are communicating and creating on our platform — each in their native language — 24 hours a day. Manually translating every conversation happening across more than 15 million active experiences, all in real time, is obviously not feasible. Scaling these live translations to millions of people, all having different conversations in different experiences simultaneously, requires an LLM with tremendous speed and accuracy. We need a context-aware model that recognizes Roblox-specific language, including slang and abbreviations (think obby, afk, or lol). Beyond all of that, our model needs to support any combination of the 16 languages Roblox currently supports.
To achieve this, we could have built out a unique model for each language pair (i.e., Japanese and Spanish), but that would have required 16×16, or 256 different models. Instead, we built a unified, transformer-based translation LLM to handle all language pairs in a single model. This is like having multiple translation apps, each specializing in a group of similar languages, all available with a single interface. Given a source sentence and target language, we can activate the relevant “expert” to generate the translations.
This architecture allows for better utilization of resources, since each expert has a different specialty, which leads to more efficient training and inference — without sacrificing translation quality.
This architecture makes it far more efficient to train and maintain our model for a few reasons. First, our model is able to leverage linguistic similarities between languages. When all languages are trained together, languages that are similar, like Spanish and Portuguese, benefit from each other’s input during training, which helps improve the translation quality for both languages. We can also far more easily test and integrate new research and advances in LLMs into our system as they’re released, to benefit from the latest and greatest techniques available. We see another benefit of this unified model in cases where the source language is not set or is set incorrectly, where the model is accurate enough that it’s able to detect the correct source language and translate into the target language. In fact, even if the input has a mix of languages, the system is still able to detect and translate into the target language. In these cases, the accuracy may not be quite as high, but the final message will be reasonably understandable.
To train this unified model, we began by pretraining on available open source data, as well as our own in-experience translation data, human-labeled chat translation results, and common chat sentences and phrases. We also built our own translation evaluation metric and model to measure translation quality. Most off-the-shelf translation quality metrics compare the AI translation result to some ground truth or reference translation and focus primarily on the understandability of the translation. We wanted to assess the quality of the translation — without a ground truth translation.
We look at this from multiple aspects, including accuracy (whether there are any additions, omissions, or mistranslations), fluency (punctuation, spelling, and grammar), and incorrect references (discrepancies with the rest of the text). We classify these errors into severity levels: Is it a critical, major, or minor error? In order to assess quality, we built an ML model and trained it on human labeled error types and scores. We then fine-tuned a multilingual language model to predict word-level errors and types and calculate a score using our multidimensional criteria. This gives us a comprehensive understanding of the quality and types of errors occurring. In this way we can estimate translation quality and detect errors by using source text and machine translations, without requiring a ground truth translation. Using the results of this quality measure, we can further improve the quality of our translation model.
Less common translation pairs (say, French to Thai), are challenging due to a lack of high quality data. To address this gap, we applied back translation, where content is translated back into the original language, then compared to the source text for accuracy. During the training process, we used iterative back translation, where we use a strategic mix of this back translated data and supervised (labeled) data to expand the amount of translation data for the model to learn on.
To help the model understand modern slang, we asked human evaluators to translate popular and trending terms for each language, and included those translations in our training data. We will continue to repeat this process regularly to keep the system up to date on the latest slang.
The resulting chat translation model has roughly 1 billion parameters. Running a translation through a model this large is prohibitively resource-intensive to serve at scale and would take much too long for a real-time conversation, where low latency is critical to support more than 5,000 chats per second. So we used this large translation model in a student-teacher approach to build a smaller, lighter weight model. We applied distillation, quantization, model compilation, and other serving optimizations to reduce the size of the model to fewer than 650 million parameters and improve the serving efficiency. In addition, we modified the API behind in-experience text chat to send both the original and the translated messages to the person’s device. This enables the recipient to see the message in their native language or quickly switch to see the sender’s original, non-translated message.
Once the final LLM was ready, we implemented a back end to connect with the model servers. This back end is where we apply additional chat translation logic and integrate the system with our usual trust and safety systems. This ensures translated text gets the same level of scrutiny as other text, in order to detect and block words or phrases that violate our policies. Safety and civility is at the forefront of everything we do at Roblox, so this was a very important piece of the puzzle.
In testing, we’ve seen that this new translation system drives stronger engagement and session quality for the people on our platform. Based on our own metric, our model outperforms commercial translation APIs on Roblox content, indicating that we’ve successfully optimized for how people communicate on Roblox. We’re excited to see how this improves the experience for people on the platform, making it possible for them to play games, shop, collaborate, or just catch up with friends who speak a different language.
The ability for people to have seamless, natural conversations in their native languages brings us closer to our goal of connecting a billion people with optimism and civility.
To further improve the accuracy of our translations and to provide our model with better training data, we plan to roll out a tool to allow people on the platform to provide feedback on their translations and help the system improve even faster. This would enable someone to tell us when they see something that’s been mistranslated and even suggest a better translation we can add into the training data to further improve the model.
These translations are available today for all 16 languages we support — but we are far from done. We plan to continue to update our models with the latest translation examples from within our experiences as well as popular chat phrases and the latest slang phrases in every language we support. In addition, this architecture will make it possible to train the model on new languages with relatively low effort, as sufficient training data becomes available for those languages. Further out, we’re exploring ways to automatically translate everything in multiple dimensions: text on images, textures, 3D models, etc.
And we are already exploring exciting new frontiers, including automatic voice chat translations. Imagine a French speaker on Roblox being able to voice chat with someone who only speaks Russian. Both could speak to and understand one another, right down to the tone, rhythm, and emotion of their voice, in their own language, and at low latency. While this may sound like science fiction today, and it will take some time to achieve, we will continue to push forward on translation. In the not-too-distant future, Roblox will be a place where people from all around the world can seamlessly and effortlessly communicate not just via text chat, but in every possible modality!
The post Breaking Down Language Barriers with a Multilingual Translation Model appeared first on Roblox Blog.
]]>The post Inside the Tech – Solving for Safety in Immersive Voice Communication appeared first on Roblox Blog.
]]>We prioritize maintaining a safe and positive experience for our users. Safety and civility are always top of mind for us, but handling it in real time can be a big technical challenge. Whenever there’s an issue, we want to be able to review it and take action in real time, but this is challenging given our scale. In order to handle this scale effectively, we need to leverage automated safety systems.
Another technical challenge that we’re focused on is the accuracy of our safety measures for moderation. There are two moderation approaches to address policy violations and provide accurate feedback in real time: reactive and proactive moderation. For reactive moderation, we’re developing machine learning (ML) models to accurately identify different types of policy violations, which work by responding to reports from people on the platform. Proactively, we’re working on real-time detection of potential content that violates our policies, educating users about their behavior. Understanding the spoken word and improving audio quality is a complex process. We’re already seeing progress, but our ultimate goal is to have a highly precise model that can detect policy-violating behavior in real time.
We have developed an end-to-end ML model that can analyze audio data and provides a confidence level based on the type of policy violations (e.g. how likely is this bullying, profanity, etc.). This model has significantly improved our ability to automatically close certain reports. We take action when our model is confident and can be sure that it outperforms humans. Within just a handful of months after launching, we were able to moderate almost all English voice abuse reports with this model. We’ve developed these models in-house and it’s a testament to the collaboration between a lot of open source technologies and our own work to create the tech behind it.
There’s a lot of thought put into making the system contextually aware. We also look at patterns over time before we take action so we can be sure that our actions are justified. Our policies are nuanced depending on a person’s age, whether they’re in a public space or a private chat, and many other factors. We are exploring new ways to promote civility in real time and ML is at the heart of it. We recently launched automated push notifications (or “nudges”) to remind users of our policies. We’re also looking into other factors like tone of voice to better understand a person’s intentions and distinguish things like sarcasm or jokes. Lastly, we’re also building a multilingual model since some people speak multiple languages or even switch languages mid-sentence. For any of this to be possible, we have to have an accurate model.
Currently, we are focused on addressing the most prominent forms of abuse, such as harassment, discrimination, and profanity. These make up the majority of abuse reports. Our aim is to have a significant impact in these areas and set the industry norms for what promoting and maintaining a civil online conversation looks like. We’re excited about the potential of using ML in real time, as it enables us to effectively foster a safe and civil experience for everyone.
Our Chat with Spatial Voice technology creates a more immersive experience, mimicking real-world communication. For instance, if I’m standing to the left of someone, they’ll hear me in their left ear. We’re creating an analog to how communication works in the real world and this is a challenge we’re in the position to solve first.
As a gamer myself, I’ve witnessed a lot of harassment and bullying in online gaming. It’s a problem that often goes unchecked due to user anonymity and a lack of consequences. However, the technical challenges that we’re tackling around this are unique to what other platforms are facing in a couple of areas. On some gaming platforms, interactions are limited to teammates. Roblox offers a variety of ways to hangout in a social environment that more closely mimics real life. With advancements in ML and real-time signal processing, we’re able to effectively detect and address abusive behavior which means we’re not only a more realistic environment, but also one where everyone feels safe to interact and connect with others. The combination of our technology, our immersive platform, and our commitment to educating users about our policies puts us in a position to tackle these challenges head on.
I feel like I’ve learned a considerable deal. I’m not an ML engineer. I’ve worked mostly on the front end in gaming, so just being able to go deeper than I have about how these models work has been huge. My hope is that the actions we’re taking to promote civility translate to a level of empathy in the online community that has been lacking.
One last learning is that everything depends on the training data you put in. And for the data to be accurate, humans have to agree on the labels being used to categorize certain policy-violating behaviors. It’s really important to train on quality data that everyone can agree on. It’s a really hard problem to solve. You begin to see areas where ML is way ahead of everything else, and then other areas where it’s still in the early stages. There are still many areas where ML is still growing, so being cognizant of its current limits is key.
Respecting the community is our guiding value throughout this process. First, we need to focus on improving civility and reducing policy violations on our platform. This has a significant impact on the overall user experience. Second, we must carefully consider how we roll out these new features. We need to be mindful of false positives (e.g. incorrectly marking something as abuse) in the model and avoid incorrectly penalizing users. Monitoring the performance of our models and their impact on user engagement is crucial.
We have made significant progress in improving public voice communication, but there is still much more to be done. Private communication is an exciting area to explore. I think there’s a huge opportunity to improve private communication, to allow users to express themselves to close friends, to have a voice call going across experiences or during an experience while they interact with their friends. I think there’s also an opportunity to foster these communities with better tools to enable users to self-organize, join communities, share content, and share ideas.
As we continue to grow, how do we scale our chat technology to support these expanding communities? We’re just scratching the surface on a lot of what we can do, and I think there’s a chance to improve the civility of online communication and collaboration across the industry in a way that has not been done before. With the right technology and ML capabilities, we’re in a unique position to shape the future of civil online communication.
The post Inside the Tech – Solving for Safety in Immersive Voice Communication appeared first on Roblox Blog.
]]>The post Inside the Tech – Solving for Avatar Facial Expressions appeared first on Roblox Blog.
]]>When we think about how an avatar represents someone on Roblox, we typically consider two things: How it behaves and how it looks. So one major focus for my team is enabling avatars to mirror a person’s expressions. For example, when someone smiles, their avatar smiles in sync with them.
One of the hard things about tracking facial expressions is tuning the efficiency of our model so that we can capture these expressions directly on the person’s device in real time. We’re committed to making this feature accessible to as many people on Roblox as possible, and we need to support a huge range of devices. The amount of compute power someone’s device can handle is a vital factor in that. We want everyone to be able to express themselves, not just people with powerful devices. So we’re deploying one of our first-ever deep learning models to make this possible.
The second key technical challenge we’re tackling is simplifying the process creators use to develop dynamic avatars people can personalize. Creating avatars like that is pretty complicated because you have to model the head and if you want it to animate, you have to do very specific things to rig the model, like placing joints and weights for linear blend skinning. We want to make this process easier for creators, so we’re developing technology to simplify it. They should only have to focus on building the static model. When they do, we can automatically rig and cage it. Then, facial tracking and layered clothing should work right off the bat.
We’ve done a couple important things to ensure we get the right information for facial expressions. That starts with using industry-standard FACS (Facial Animation Control System). These are the key to everything because they’re what we use to drive an avatar’s facial expressions—how wide the mouth is, which eyes open and how much, and so on. We can use around 50 different FACS controls to describe a desired facial expression.
When you’re building a machine learning algorithm to estimate facial expressions from images or video, you train a model by showing it example images with known ground truth expressions (described with FACS). By showing the model many different images with different expressions, the model learns to estimate the facial expression of previously unseen faces.
Normally, when you’re working on facial tracking, these expressions are labeled by humans, and the easiest method is using landmarks—for example, placing dots on an image to mark the pixel locations of facial features like the corners of the eyes.
But FACS weights are different because you can’t look at a picture and say, “The mouth is open 0.9 vs. 0.5.” To solve for this, we’re using synthetic data to generate FACS weights directly that consist of 3D models rendered with FACS poses from different angles and lighting conditions.
Unfortunately, because the model needs to generalize to real faces, we can’t solely train on synthetic data. So we pre-train the model on a landmark prediction task using a combination of real and synthetic data, allowing the model to learn the FACS prediction task using purely synthetic data.
We want face tracking to work for everyone, but some devices are more powerful than others. This means we needed to build a system capable of dynamically adapting itself to the processing power of any device. We accomplished this by splitting our model into a fast approximate FACS prediction phase called BaseNet and a more accurate FACS refinement phase called HiFiNet. During runtime, the system measures its performance, and under optimal conditions, we run both model phases. But if a slowdown is detected (for example, because of a lower-end device), the system runs only the first phase.
One is that getting a feature to work is such a small part of what it actually takes to release something successfully. A ton of the work is in the engineering and unit testing process. We need to make sure we have good ways of determining if we have a good pipeline of data. And we need to ask ourselves, “Hey, is this new model actually better than the old one?”
Before we even start the core engineering, all the pipelines we put in place for tracking experiments, ensuring our dataset represents the diversity of our users, evaluating results, and deploying and getting feedback on those new results go into making the model sufficient. But that’s a part of the process that doesn’t get talked about as much, even though it’s so critical.
Understanding the phase of a project is key, so during innovation, taking the long view matters a lot, especially in research when you’re trying to solve important problems. But respecting the community is also crucial when you’re identifying the problems that are worth innovating on because we want to work on the problems with the most value to our broader community. For example, we specifically chose to work on “face tracking for all” rather than just “face tracking.” As you reach the 90 percent mark of building something, transitioning a prototype into a functional feature hinges on execution and adapting to the project’s stage.
I’ve always gravitated toward working on tools that help people be creative. Creating something is special because you end up with something that’s uniquely yours. I’ve worked in visual effects and on various photo editing tools, using math, science, research, and engineering insights to empower people to do really interesting things. Now, at Roblox, I get to take that to a whole new level. Roblox is a creativity platform, not just a tool. And the scale at which we get to build tools that enable creativity is much bigger than anything I’ve worked on before, which is incredibly exciting.
The post Inside the Tech – Solving for Avatar Facial Expressions appeared first on Roblox Blog.
]]>The post 2023 Year in Review — A Letter From Our CEO appeared first on Roblox Blog.
]]>Throughout 2023, we shipped a broad range of innovations that helped us make significant progress towards our core goals:
Take a look at some key highlights above, and read on for more detail on the great work we did together:
Immersive Communication
Over the last year, we’ve added a range of new tools that make connection and communication on Roblox even more like real life. We’ve made it simple for people to bring their friends and family onto Roblox by importing their contact list from their phone or sharing a QR code with a friend. We also launched Chat with Voice, and we’ve been working to ensure that all avatar heads can utilize facial animation. These advances enable new, richer ways for people to express themselves just like they would in the real world. This year, we also began allowing people to customize friends’ names, including using their real names (Q4 23) if they’re over 17.
In 2023, improvements we made to our advanced real-time communication tools created new opportunities for our developers. For example, we introduced a technology that developers can incorporate into their experiences to allow people to call one another as their avatars for impromptu conversations and gatherings (Q4 23). This technology uses face tracking, so people can actually see each other’s facial expressions and body language while they’re communicating, which really helps them feel closer. In fact, we released our own experience based on this technology, called Roblox Connect, and I’ve been using it to talk with my family on Roblox, which has just been amazing.
For this technology, it’s still early. Our job is to provide infrastructure and tools for the community and to see what our developers build. And I’m always astonished by their creativity as more and more becomes possible on the platform.
Expressing Yourself With Avatars You Love
We believe that when someone can express their authentic self with their avatar, they’ll have a better time on Roblox. So this year, we gave our community new ways to do that. This could mean building an avatar that looks like the person does in real life, or one that resembles a jetpack-wearing ninja, an emo princess, or almost anything else they can dream up. Here’s how:
Next year, we’ll double down on developing generative AI tools that let people create avatars based on images and text prompts. We’re being thoughtful about moderating the avatars created with AI and the interactions people have with them, and ensuring it’s done with safety and civility in mind.
At the same time, we always want Roblox to be simple to use, so we’re making it easier than ever for creators to be sure their experiences work for everyone.
The Future of Creation
Everything you see on Roblox was imagined and built by our community, and we’re continuing to evolve our platform to empower more and more of that creativity:
Creator Connections
Roblox wouldn’t be where it is without our talented, diverse, and global creator community. This year, we rolled out our new Creator Roadmap to reflect how we’re building our platform hand-in-hand with creators. The Roadmap is a place for us to give our creators an early look at the things we’re working on, collaborate, and get their feedback.
In 2023, we connected with development studios from around the world both in person and virtually. And at Roblox Developers Conference (RDC) in September, thousands of people came together for informative and inspirational sessions, booth demos, and networking events. RDC continues to be a great place for developers, brands, and influencers to learn from each other and for us to connect with our talented and growing community.
At Connect 2023, our conference by creators for creators, more than 12,000 creators gathered for a series of networking opportunities and competitions. And with the launch of Creator Events, we saw creators from all over the world come together to share knowledge and learn from one another by hosting dozens of online workshops.
Community Creations
This was a year of breakout hits on Roblox. In fact, 38 percent of the top 1,000 experiences on the platform were created within the last 12 months.*** A sampling of some of the exciting new experiences on Roblox includes:
At the same time, our Game Fund debuted ten experiences that leverage our latest innovations and push the boundaries of what’s possible on Roblox. Those releases included Skullbeat by Chop Chop Games & Kong Orange, Primal Hunt by Phaser Lock Interactive, and—from the Hello Neighbor universe—Secret Neighbor by Tinybuild.
I’m happy to say that this growth extends beyond just experience creation. During the first three quarters of 2023, community creators sold nearly 1.6 billion digital fashion items and accessories. As a result, users were constantly changing their look, with 165 billion avatar updates over that same timeframe.
Honoring the Community
In November of this year, we awarded 16 winners at the Roblox Innovation Awards. Great creators at LSPLASH won both the Builderman Award of Excellence and Best New Experience for Doors, and Gamer Robot Inc. won the People’s Choice award for Blox Fruits.
We also saw incredible work from influencers in our Video Stars program like Tanqr who won Best Video Star, Temprist for creating the Best Video, and TeraBrite and RussoPlays for delivering yet another community smash hit season of RB Battles. This year we welcomed a diverse cast of more than 50 new video creators into the program, and we can’t wait to see what they continue to create.
I’m so proud that Roblox is home to one of the world’s most passionate and innovative creator communities, and so impressed by their monumental year of creativity and innovation.
We’ve long talked about the multiple elements driving the growth of Roblox—being a platform for everyone, being available everywhere, growing internationally, and building a vibrant economy that serves our entire community. By combining all of those elements, the growth of our platform gets very interesting very quickly.
Roblox for Everyone
In Q3 2023, more than 57 percent of our users were 13 or older and the fastest growing age group on Roblox was 17-24 year olds. And we believe there is much more growth to come with this audience.
This year, we introduced experiences for 17+ users (Q2 23), which means creators can incorporate the kinds of mature themes and storylines found on TV shows or stand-up comedy into what they build. This is exciting because brands on Roblox now have access to a valuable, often hard-to-reach demographic, and because many developers over 17, the group that creates the majority of our top 1,000 experiences, want to build for older users so they can express themselves more freely.
Roblox Everywhere
For years, people joined Roblox from a wide range of devices including mobile (iOS and Android), desktop, and gaming consoles.
Making Roblox available everywhere and on every platform is a big focus for us so we can be where our users want to join us. This year, we expanded that range by bringing Roblox to Meta Quest (fully available in Q3 23) and PlayStation (Q4 23). Millions more people can now access our platform, which opens up even more opportunities for developers to create and instantly share their experiences. They can now easily publish and distribute existing experiences to these popular global platforms or create unique, new experiences for VR or console.
Roblox Around the World
Our users come from all corners of the globe, and this year, we launched four new languages to better support people on Roblox in 180 countries. We also made significant progress in markets like Japan, where we grew DAUs 66 percent year-over-year** in Q3 2023. And we’re learning from those efforts. When we develop Roblox in a new market, we’re systematic about making sure the users there will immediately have a great experience, so we zero in on search and discovery, natural language translation quality, and the performance of our infrastructure. As we think about joining new markets, we focus on:
This playbook has been successful in Japan and India, and it’s guiding our work in other countries. For example, Bookings in Germany grew 75 percent year-over-year** in Q3 2023. We’re also excited about the big opportunities we’re seeing in major markets like Brazil and India.
A Vibrant Economy
The Roblox economy is designed to model and reflect real-world dynamics. We built it to offer seamless participation for anyone, serve our community, and make it possible for any creator to start and grow a business. From the beginning of October 2022 through the end of September 2023, creators earned $701 million in developer exchange fees on Roblox. Some ways we enhanced our economy this year include:
Monetization and Advertising
For years, Roblox has been a powerful monetization engine, but with advertising and, in the future, real-world commerce, we’re unlocking substantial potential financial growth. Today, only about 20 percent of engagement hours on Roblox are generated by monthly unique payers. But the potential for advertising, due to our large and growing Gen Z audience, and eventually, real-world commerce, highlights new opportunities for creators, including brands, to expand the ways they can earn.
This comes as we’ve made it easier for brands and developers alike to earn from their experiences by displaying ads in them. And we’re reducing the barrier to entry for those that want to display ads with tools like:
More Brands and Industries
We have always aimed to make Roblox more valuable to more people and in 2023, we saw top brands and talent engaging and creating deep connections with our community. Brands like Adidas, NBA, and e.l.f. Cosmetics all created incredible experiences on the platform, artists like Nicki Minaj and Olivia Rodrigo (Q4 23) introduced immersive shopping experiences, and Karlie Kloss (Q1 23) and Paris Hilton (Q3 2024) launched their fashion-forward experiences with new avatars. Each of these used our powerful creation tools to build community spaces for connection and self-expression.
We also want to scale brand innovation and enable a self-serve, global advertising ecosystem on the platform, so we launched the Roblox Partner Program (Q2 23). The Program is focused on engaging a broad network of platform advocates—from Roblox developer studios to early adopters among agencies and third-party sellers—in global education and best practice sharing for brands.
This year, we’ve also seen expanded and engaging experiences in industries like fashion, music, auto, and travel, work/recruiting, and education.Behind all the work everyone at Roblox and in our community did this year is the infrastructure that allows us to scale rapidly and efficiently, empowering our developers to create with ease and our users to have the most reliable experience possible.
We’re also continuing to invest in the technical prowess that will underpin our ongoing growth and success. In particular, we will continue to invest in AI. We have access to unique data and insights that will allow us to leverage AI to benefit our community and platform, including:
This is both our short- and long-term future. Looking back at 2023, I’m so proud of all the work our entire team did to make Roblox such a powerful platform, and I couldn’t be more optimistic about what’s coming next.
The post 2023 Year in Review — A Letter From Our CEO appeared first on Roblox Blog.
]]>The post How We’re Making Roblox’s Infrastructure More Efficient and Resilient appeared first on Roblox Blog.
]]>Our infrastructure currently supports more than 70 million daily active users around the world, including the creators who rely on Roblox’s economy for their businesses. All of these millions of people expect a very high level of reliability. Given the immersive nature of our experiences, there is an extremely low tolerance for lags or latency, let alone outages. Roblox is a platform for communication and connection, where people come together in immersive 3D experiences. When people are communicating as their avatars in an immersive space, even minor delays or glitches are more noticeable than they are on a text thread or a conference call.
In October, 2021, we experienced a system-wide outage. It started small, with an issue in one component in one data center. But it spread quickly as we were investigating and ultimately resulted in a 73-hour outage. At the time, we shared both details about what happened and some of our early learnings from the issue. Since then, we’ve been studying those learnings and working to increase the resilience of our infrastructure to the types of failures that occur in all large-scale systems due to factors like extreme traffic spikes, weather, hardware failure, software bugs, or just humans making mistakes. When these failures occur, how do we ensure that an issue in a single component, or group of components, does not spread to the full system? This question has been our focus for the past two years and while the work is ongoing, what we’ve done so far is already paying off. For example, in the first half of 2023, we saved 125 million engagement hours per month compared to the first half of 2022. Today, we’re sharing the work we’ve already done, as well as our longer-term vision for building a more resilient infrastructure system.
Within large-scale infrastructure systems, small scale failures happen many times a day. If one machine has an issue and has to be taken out of service, that’s manageable because most companies maintain multiple instances of their back-end services. So when a single instance fails, others pick up the workload. To address these frequent failures, requests are generally set to automatically retry if they get an error.
This becomes challenging when a system or person retries too aggressively, which can become a way for those small-scale failures to propagate throughout the infrastructure to other services and systems. If the network or a user retries persistently enough, it will eventually overload every instance of that service, and potentially other systems, globally. Our 2021 outage was the result of something that’s fairly common in large scale systems: A failure starts small then propagates through the system, getting big so quickly it’s hard to resolve before everything goes down.
At the time of our outage, we had one active data center (with components within it acting as backup). We needed the ability to fail over manually to a new data center when an issue brought the existing one down. Our first priority was to ensure we had a backup deployment of Roblox, so we built that backup in a new data center, located in a different geographic region. That added protection for the worst-case scenario: an outage spreading to enough components within a data center that it becomes entirely inoperable. We now have one data center handling workloads (active) and one on standby, serving as backup (passive). Our long-term goal is to move from this active-passive configuration to an active-active configuration, in which both data centers handle workloads, with a load balancer distributing requests between them based on latency, capacity, and health. Once this is in place, we expect to have even higher reliability for all of Roblox and be able to fail over nearly instantaneously rather than over several hours.
Our next priority was to create strong blast walls inside each data center to reduce the possibility of an entire data center failing. Cells (some companies call them clusters) are essentially a set of machines and are how we’re creating these walls. We replicate services both within and across cells for added redundancy. Ultimately, we want all services at Roblox to run in cells so they can benefit from both strong blast walls and redundancy. If a cell is no longer functional, it can safely be deactivated. Replication across cells enables the service to keep running while the cell is repaired. In some cases, cell repair might mean a complete reprovisioning of the cell. Across the industry, wiping and reprovisioning an individual machine, or a small set of machines, is fairly common, but doing this for an entire cell, which contains ~1,400 machines, is not.
For this to work, these cells need to be largely uniform, so we can quickly and efficiently move workloads from one cell to another. We have set certain requirements that services need to meet before they run in a cell. For example, services must be containerized, which makes them much more portable and prevents anyone from making configuration changes at the OS level. We’ve adopted an infrastructure-as-code philosophy for cells: In our source code repository, we include the definition of everything that’s in a cell so we can rebuild it quickly from scratch using automated tools.
Not all services currently meet these requirements, so we’ve worked to help service owners meet them where possible, and we’ve built new tools to make it easy to migrate services into cells when ready. For example, our new deployment tool automatically “stripes” a service deployment across cells, so service owners don’t have to think about the replication strategy. This level of rigor makes the migration process much more challenging and time consuming, but the long-term payoff will be a system where:
Similar to the way fire doors are used to contain flames, cells act as strong blast walls within our infrastructure to help contain whatever issue is triggering a failure within a single cell. Eventually, all of the services that make up Roblox will be redundantly deployed inside of and across cells. Once this work is complete, issues could still propagate wide enough to make an entire cell inoperable, but it would be extremely difficult for an issue to propagate beyond that cell. And if we succeed in making cells interchangeable, recovery will be significantly faster because we’ll be able to fail over to a different cell and keep the issue from impacting end users.
Where this gets tricky is separating these cells enough to reduce the opportunity to propagate errors, while keeping things performant and functional. In a complex infrastructure system, services need to communicate with each other to share queries, information, workloads, etc. As we replicate these services into cells, we need to be thoughtful about how we manage cross-communication. In an ideal world, we redirect traffic from one unhealthy cell to other healthy cells. But how do we manage a “query of death”—one that’s causing a cell to be unhealthy? If we redirect that query to another cell, it can cause that cell to become unhealthy in just the way we’re trying to avoid. We need to find mechanisms to shift “good” traffic from unhealthy cells while detecting and squelching the traffic that’s causing cells to become unhealthy.
In the short term, we have deployed copies of computing services to each compute cell so that most requests to the data center can be served by a single cell. We are also load balancing traffic across cells. Looking further out, we’ve begun building a next-generation service discovery process that will be leveraged by a service mesh, which we hope to complete in 2024. This will allow us to implement sophisticated policies that will allow cross-cell communication only when it won’t negatively impact the failover cells. Also coming in 2024 will be a method for directing dependent requests to a service version in the same cell, which will minimize cross-cell traffic and thereby reduce the risk of cross-cell propagation of failures.
At peak, more than 70 percent of our back-end service traffic is being served out of cells and we’ve learned a lot about how to create cells, but we anticipate more research and testing as we continue to migrate our services through 2024 and beyond. As we progress, these blast walls will become increasingly stronger.
Roblox is a global platform supporting users all over the world, so we can’t move services during off-peak or “down time,” which further complicates the process of migrating all of our machines into cells and our services to run in those cells. We have millions of always-on experiences that need to continue to be supported, even as we move the machines they run on and the services that support them. When we started this process, we didn’t have tens of thousands of machines just sitting around unused and available to migrate these workloads onto.
We did, however, have a small number of additional machines that were purchased in anticipation of future growth. To start, we built new cells using those machines, then migrated workloads to them. We value efficiency as well as reliability, so rather than going out and buying more machines once we ran out of “spare” machines we built more cells by wiping and reprovisioning the machines we’d migrated off of. We then migrated workloads onto those reprovisioned machines, and started the process all over again. This process is complex—as machines are replaced and free up to be built into cells, they are not freeing up in an ideal, orderly fashion. They are physically fragmented across data halls, leaving us to provision them in a piecemeal fashion, which requires a hardware-level defragmentation process to keep the hardware locations aligned with large-scale physical failure domains.
A portion of our infrastructure engineering team is focused on migrating existing workloads from our legacy, or “pre-cell,” environment into cells. This work will continue until we’ve migrated thousands of different infrastructure services and thousands of back-end services into newly built cells. We expect this will take all of next year and possibly into 2025, due to some complicating factors. First, this work requires robust tooling to be built. For example, we need tooling to automatically rebalance large numbers of services when we deploy a new cell—without impacting our users. We’ve also seen services that were built with assumptions about our infrastructure. We need to revise these services so they do not depend upon things that could change in the future as we move into cells. We’ve also implemented both a way to search for known design patterns that won’t work well with cellular architecture, as well as a methodical testing process for each service that’s migrated. These processes help us head off any user-facing issues caused by a service being incompatible with cells.
Today, close to 30,000 machines are being managed by cells. It’s only a fraction of our total fleet, but it’s been a very smooth transition so far with no negative player impact. Our ultimate goal is for our systems to achieve 99.99 percent user uptime every month, meaning we would disrupt no more than 0.01 percent of engagement hours. Industry-wide, downtime cannot be completely eliminated, but our goal is to reduce any Roblox downtime to a degree that it’s nearly unnoticeable.
While our early efforts are proving successful, our work on cells is far from done. As Roblox continues to scale, we will keep working to improve the efficiency and resiliency of our systems through this and other technologies. As we go, the platform will become increasingly resilient to issues, and any issues that occur should become progressively less visible and disruptive to the people on our platform.
In summary, to date, we have:
As these cells become more interchangeable, there will be less crosstalk between cells. This unlocks some very interesting opportunities for us in terms of increasing automation around monitoring, troubleshooting, and even shifting workloads automatically.
In September we also started running active/active experiments across our data centers. This is another mechanism we’re testing to improve reliability and minimize failover times. These experiments helped identify a number of system design patterns, largely around data access, that we need to rework as we push toward becoming fully active-active. Overall, the experiment was successful enough to leave it running for the traffic from a limited number of our users.
We’re excited to keep driving this work forward to bring greater efficiency and resiliency to the platform. This work on cells and active-active infrastructure, along with our other efforts, will make it possible for us to grow into a reliable, high performing utility for millions of people and to continue to scale as we work to connect a billion people in real time.
The post How We’re Making Roblox’s Infrastructure More Efficient and Resilient appeared first on Roblox Blog.
]]>The post How Roblox Reduces Spark Join Query Costs With Machine Learning Optimized Bloom Filters appeared first on Roblox Blog.
]]>Every day on Roblox, 70 million users engage with millions of experiences, totaling 16 billion hours quarterly. This interaction generates a petabyte-scale data lake, which is enriched for analytics and machine learning (ML) purposes. It’s resource-intensive to join fact and dimension tables in our data lake, so to optimize this and reduce data shuffling, we embraced Learned Bloom Filters [1]—smart data structures using ML. By predicting presence, these filters considerably trim join data, enhancing efficiency and reducing costs. Along the way, we also improved our model architectures and demonstrated the substantial benefits they offer for reducing memory and CPU hours for processing, as well as increasing operational stability.
In our data lake, fact tables and data cubes are temporally partitioned for efficient access, while dimension tables lack such partitions, and joining them with fact tables during updates is resource-intensive. The key space of the join is driven by the temporal partition of the fact table being joined. The dimension entities present in that temporal partition are a small subset of those present in the entire dimension dataset. As a result, the majority of the shuffled dimension data in these joins is eventually discarded. To optimize this process and reduce unnecessary shuffling, we considered using Bloom Filters on distinct join keys but faced filter size and memory footprint issues.
To address them, we explored Learned Bloom Filters, an ML-based solution that reduces Bloom Filter size while maintaining low false positive rates. This innovation enhances the efficiency of join operations by reducing computational costs and improving system stability. The following schematic illustrates the conventional and optimized join processes in our distributed computing environment.
To optimize the join between fact and dimension tables, we adopted the Learned Bloom Filter implementation. We constructed an index from the keys present in the fact table and subsequently deployed the index to pre-filter dimension data before the join operation.
While a traditional Bloom Filter is efficient, it adds 15-25% of additional memory per worker node needing to load it to hit our desired false positive rate. But by harnessing Learned Bloom Filters, we achieved a considerably reduced index size while maintaining the same false positive rate. This is because of the transformation of the Bloom Filter into a binary classification problem. Positive labels indicate the presence of values in the index, while negative labels mean they’re absent.
The introduction of an ML model facilitates the initial check for values, followed by a backup Bloom Filter for eliminating false negatives. The reduced size stems from the model’s compressed representation and reduced number of keys required by the backup Bloom Filter. This distinguishes it from the conventional Bloom Filter approach.
As part of this work, we established two metrics for evaluating our Learned Bloom Filter approach: the index’s final serialized object size and CPU consumption during the execution of join queries.
Our initial challenge was addressing a highly biased training dataset with few dimension table keys in the fact table. In doing so, we observed an overlap of approximately one-in-three keys between the tables. To tackle this, we leveraged the Sandwich Learned Bloom Filter approach [2]. This integrates an initial traditional Bloom Filter to rebalance the dataset distribution by removing the majority of keys that were missing from the fact table, effectively eliminating negative samples from the dataset. Subsequently, only the keys included in the initial Bloom Filter, along with the false positives, were forwarded to the ML model, often referred to as the “learned oracle.” This approach resulted in a well-balanced training dataset for the learned oracle, overcoming the bias issue effectively.
The second challenge centered on model architecture and training features. Unlike the classic problem of phishing URLs [1], our join keys (which in most cases are unique identifiers for users/experiences) weren’t inherently informative. This led us to explore dimension attributes as potential model features that can help predict if a dimension entity is present in the fact table. For example, imagine a fact table that contains user session information for experiences in a particular language. The geographic location or the language preference attribute of the user dimension would be good indicators of whether an individual user is present in the fact table or not.
The third challenge—inference latency—required models that both minimized false negatives and provided rapid responses. A gradient-boosted tree model was the optimal choice for these key metrics, and we pruned its feature set to balance precision and speed.
Our updated join query using learned Bloom Filters is as shown below:
Here are the results of our experiments with Learned Bloom filters in our data lake. We integrated them into five production workloads, each of which possessed different data characteristics. The most computationally expensive part of these workloads is the join between a fact table and a dimension table. The key space of the fact tables is approximately 30% of the dimension table. To begin with, we discuss how the Learned Bloom Filter outperformed traditional Bloom Filters in terms of final serialized object size. Next, we show performance improvements that we observed by integrating Learned Bloom Filters into our workload processing pipelines.
As shown below, when looking at a given false positive rate, the two variants of the learned Bloom Filter improve total object size by between 17-42% when compared to traditional Bloom Filters.
In addition, by using a smaller subset of features in our gradient boosted tree based model, we lost only a small percentage of optimization while making inference faster.
In this section, we compare the performance of Bloom Filter-based joins to that of regular joins across several metrics.
The table below compares the performance of workloads with and without the use of Learned Bloom Filters. A Learned Bloom Filter with 1% total false positive probability demonstrates the comparison below while maintaining the same cluster configuration for both join types.
First, we found that Bloom Filter implementation outperformed the regular join by as much as 60% in CPU hours. We saw an increase in CPU usage of the scan step for the Learned Bloom Filter approach due to the additional compute spent in evaluating the Bloom Filter. However, the prefiltering done in this step reduced the size of data being shuffled, which helped reduce the CPU used by the downstream steps, thus reducing the total CPU hours.
Second, Learned Bloom Filters have about 80% less total data size and about 80% less total shuffle bytes written than a regular join. This leads to more stable join performance as discussed below.
We also saw reduced resource usage in our other production workloads under experimentation. Over a period of two weeks across all five workloads, the Learned Bloom Filter approach generated an average daily cost savings of 25%, which also accounts for model training and index creation.
Due to the reduced amount of data shuffled while performing the join, we were able to significantly reduce the operational costs of our analytics pipeline while also making it more stable.The following chart shows variability (using a coefficient of variation) in run durations (wall clock time) for a regular join workload and a Learned Bloom Filter based workload over a two-week period for the five workloads we experimented with. The runs using Learned Bloom Filters were more stable—more consistent in duration—which opens up the possibility of moving them to cheaper transient unreliable compute resources.
[1] T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. https://arxiv.org/abs/1712.01208, 2017.
[2] M. Mitzenmacher. Optimizing Learned Bloom Filters by Sandwiching.
https://arxiv.org/abs/1803.01474, 2018.
¹As of 3 months ended June 30, 2023
²As of 3 months ended June 30, 2023
The post How Roblox Reduces Spark Join Query Costs With Machine Learning Optimized Bloom Filters appeared first on Roblox Blog.
]]>The post Insights From Our Latest Digital Expression, Fashion & Beauty Trends Report appeared first on Roblox Blog.
]]>Self-expression is a vital part of many people’s experiences in immersive 3D spaces—especially Gen Z, who are growing up building connections in digital worlds. That’s why we’ve put together the 2023 Digital Expression, Fashion & Beauty Trends Report, which explores the full spectrum of self-expression through avatars, including brand considerations, the psychology behind creating an avatar look and the impact of authentic self-expression on people’s physical style, purchasing decisions, and even mental well-being.
This work builds on the research we did last year that provided valuable early insights on how people express themselves in immersive spaces. Our 2023 report offers new insights that will help creators, brands and industry experts better anticipate and respond to quickly evolving consumer needs.
Here are the top 5 takeaways from the 2023 report*:
In this year’s survey* of over 1,500 members of Gen Z in the U.S. and UK who are active on platforms like Roblox, 56% say styling their avatar is more important to them than styling themselves in the physical world. And for older Gen Z aged 22-26, 64% say that, given a choice, dressing up their avatar would be more important than dressing up in the physical world.
Additionally, 84% of Gen Z respondents say digital fashion is at least somewhat important for them, and 85% think the importance of digital fashion has grown at least some over the past year. More than half (53%) think it’s grown a lot.
These findings echo what we see on our platform: self-expression through digital identity and fashion is an essential part of people’s experience. For example, during the first three quarters of 2023 on Roblox, there were a total of 165 billion avatar updates, up 38% year over year, and people bought nearly 1.6 billion digital fashion items and accessories, up 15% year over year. Plus, millions of Roblox users continue to update their avatars every day.
But the influence of digital style and fashion doesn’t stay in the virtual world. In the survey, 84% report that their physical style is at least somewhat inspired by their avatar’s style, including 54% who say they are very or extremely inspired by what their avatar and other avatars wear.
When it comes to metaverse fashion, survey respondents stress that they care about distinct styles and brand recognition: 52% say “stylish digital clothes” is the attribute they pay most attention to when deciding if an avatar is “cool-looking.” And three in four say wearing digital fashions from a recognized brand is at least somewhat important, including 47% of survey respondents who say it’s very or extremely important.
This dynamic can drive purchasing behavior: 84% say that after wearing or trying on a brand’s item in virtual spaces, they’d be at least somewhat likely to consider this brand in the physical world. In fact, 50% say they’d be very or extremely likely to do so.
Meanwhile, designers and brands will be happy to learn that most Gen Z users are also willing to spend on digital fashion: in our survey, 52% say they’re comfortable budgeting up to $10 each month, while another 19% are willing to spend up to $20 monthly and an additional 18% are open to buying $50-$100 of items every month.
The launch of Limiteds this year highlighted Roblox users’ demand for exclusive and rare items, as evidenced by most Limiteds reselling for more than their original cost.
For example, community members lined up to earn Limiteds via challenges in the Gucci Ancora experience and to buy up items from Roblox-native brands like CHRUSH.
Similarly, a leading electronic music brand, Monstercat, recently teamed up with community creator @WhoseTrade on six single-edition necklaces. Each sold within minutes, including the Ruby Pendant, acquired for 1,000,001 Robux (approximately $10,000), the highest initial Limited sale to date.
While digital fashion is important to Gen Z users, people are also experimenting with other innovative ways of expression through their avatars.
One example of this is avatar makeup, which is already available in some community-created experiences. In addition, numerous brands—like Fenty Beauty, Maybelline, NARS, Givenchy Beauty, NYX, and L’Oreal—are now investing in meeting customers’ interest in it.
And there’s real opportunity for them. According to our survey, more than a third of all respondents (35%) say it’s important to customize their avatar’s makeup daily or weekly, and the number rises to 51% for self-identifying female respondents.
People are also increasingly customizing their avatar hair on Roblox. This year alone, users purchased more than 139 million hairstyles, up 20% over the year before, including more than 7.3 million people who bought five or more hairstyles on Roblox.
But self-expression doesn’t end there: Roblox users have increasingly been adopting emotes, and so far this year, 9.8 million Roblox users bought them, up 64% year over year. That’s something that Tommy Hilfiger took note of in introducing emotes into its Roblox digital fashion collection.
Users are also choosing fantastical auras that match their vibe, like a colorful variety available within Paris Hilton’s Slivingland.
And soon, Roblox users will be able to have expressive avatars featuring realistic emotions. That’s likely to be well-received by Gen Z users since 86% of survey respondents say it is at least somewhat important that their avatar is able to express emotions in order to feel fully represented in the metaverse.
One striking finding from the survey is that most members of Gen Z strive to look good in the metaverse for themselves rather than for others. When choosing their avatar look, 62% say they care a lot that their avatar looks good to them as compared to 37% who say they care a lot that it looks good to others.
And 40% of Gen Z feel it’s easier to present their authentic selves in the metaverse than in the physical world. Among the reasons cited: more “freedom of expression” and “creative options.” Further, people feel they “can be whoever we want” and that it’s “less judgemental” when they interact with others as avatars in immersive spaces.
In fact, our research showed:
Finally, respondents cited a positive impact on their mental well-being: 88% say expressing themselves in immersive spaces has likely helped them comfortably express themselves in the physical world. They note it helps build connections with others (29%), boosts confidence (24%), allows for true self-expression (21%), and helps improve mental health in other ways (25%).
Authentic self-expression is often described as a universal connector for people: by sharing who we truly are, we can make genuine connections. As Roblox continues building its platform and products for immersive communication and connection, we’re ensuring that people have the broadest set of opportunities to authentically express themselves. We’re excited to continue studying this space because as our research demonstrated, we know that when people have more control over the many elements they can choose to represent themselves in immersive 3D digital spaces, it can lead to positive impacts on their physical-world connections and well-being.
* Methodology: The ‘2023 Digital Expression, Fashion & Beauty Trends’ report includes two complementary sets of data:
The post Insights From Our Latest Digital Expression, Fashion & Beauty Trends Report appeared first on Roblox Blog.
]]>The post Inside the Tech – Solving for Multilingual & Semantic Search appeared first on Roblox Blog.
]]>Until about a year ago, Roblox search used a lexical system to match results to users’ searches, meaning it focused solely on text matching. But search behaviors are changing quickly and that approach is no longer sufficient to give users relevant content. At the same time, some Roblox users may use incorrect spelling in their queries. So, we have to be able to suggest results that match what they’re looking for, which means understanding their intent.
Another major problem in search is a lack of training data across languages. Before semantic search, our first step was to leverage machine translations within the Roblox system. We indexed the translations and then did a text match. But that isn’t sufficient for always showing users relevant content. So, we’ve adopted a more state-of-the-art ML technique called a student-teacher model: the teacher learns from our biggest source of context for any specific scenario.
English is the most used language on Roblox, which is why we learn as many semantic relationships as we can in English—the teacher model—and then we distill it to the student model by extending that to other languages. This helps us solve that problem even though we don’t have a lot of data in certain languages. This has led to a 15% increase in plays originating from search in Japan.
We’ve recently been working to better support our of catalog queries like “đua xe (racing).” But users are more frequently submitting long, freeform queries, like, “Hey, I remember playing a game where there was a dragon and a girl fighting with it. Can you help me find that?” This presents more technical challenges and we’re continuing to improve our systems along these lines.
We’ve built a hybrid search system that takes lexical search and combines it with ML techniques and models utilizing semantic search and the understanding of a query’s intent. We’re continuously evolving our systems to build context understanding, handle complex queries, and return relevant content.
The magic of semantic search is in the embeddings, which are rich representations of a variety of signals we get from all across Roblox. For example, we’re incorporating signals like user demographics, a user’s query, how long it is, or what its unique aspects are.
We’re also looking at content signals, like experiences, avatar items, and engagement—how often was this game played or how many users did it have, and from how many countries? There are also things like monetization and retention, as well as metadata like an experience’s title, description, or creator. We put all of these through a BERT-based, transformer-based architecture and we use a Multilayer Perceptron at the end to generate embeddings, which become our source of truth.
Another innovation is our in-house similarity search system. When someone makes a search query, we retrieve the closely-related embeddings, and rank them to be sure they’re relevant to what the user is looking for. And then we return the results to users.
Every language presents its own unique challenge. And especially with search, we need to understand what users in different parts of the world are looking for so that we can show them the most relevant results. We have to understand different language elements. For example, pre-trained transformers have been essential to understanding the multiple dialects of Japanese.
Secondly, search query patterns have been changing quite a bit and we have to continuously evolve our technology stack to keep up. At the same time, we need to inform our users about what is possible on our platform, as they may not realize it. For example, we could tell our users that search can support things like freestyle queries (such as racing games or popular food games) and that it understands what people are looking for and can return appropriate results.
Taking the long view is core to our team and it’s one of the reasons why I love working at Roblox.
One example from my team is our tech stack, which consists of our ML- and NLP-based search systems—semantic search, autocomplete and spelling correction using pre-trained large models.
We’ve built this with reusability in mind across different types of searches made by our tens of millions of daily active users. That means we can plug in a different type of data (for example, avatar items instead of experiences), and it should work with very minimal changes.
We’ve incorporated semantic search for experiences, and we’ve shared it with other verticals like Marketplace, and they’ve been able to just jump on the existing architecture. It’s not perfectly plug-and-play, but with some fine-tuning, we can adapt it across different use cases.
Search is the only surface where users express their explicit intent. And that means it’s essential that we understand what they want and give them the most relevant results. So it’s really exciting to me to work on understanding that intent and educating our users about what is possible, sometimes even before the user realizes it.
A user in any country can ask something and we can give them exactly what they want and that’s most relevant to them. This builds trust which, in turn, improves retention. It’s exciting to me to take on the challenge of improving search to build that trust and help Roblox achieve our goal of having a billion users.
The post Inside the Tech – Solving for Multilingual & Semantic Search appeared first on Roblox Blog.
]]>