Fortnite Destroys PUBG’s Concurrent Player Record, Servers Immediately Crash

Fortnite

Although PlayerUnknown’s Battlegrounds dominated the gaming landscape in 2017 Epic Games’ Fortnite has been making huge leaps and bounds in popularity, especially after the title’s free-to-play Battle Royale mode was implemented. The game’s popularity came to a head recently, and as a result Fortnite officially surpassed PUBG’s record of concurrent players on Steam by an extra 200,000 players.

Unfortunately the Fortnite servers couldn’t handle the strain, and as a result said servers suffered numerous outages and downtime between February 3rd and 4th. The team behind PUBG’s toughest competitor took to their website to announce their record-breaking numbers, as well as provide its community with an in-depth look into why the server crashes occurred, and what the team is doing to prevent further incidents. While the nitty-gritty details and graphs can be found here, below is the team’s next steps in order to “ensure service availability”.

Fortnite – What’s Next

  • Identify and resolve the root cause of our DB performance issues. We’ve flown Mongo experts on-site to analyze our DB and usage, as well as provide real-time support during heavy load on weekends.
  • Optimize, reduce, and eliminate all unnecessary calls to the backend from the client or servers. Some examples are periodically verifying user entitlements when this is already happening implicitly with each game service call. Registering and unregistering individual players on a game play session when these calls can be done more efficiently in bulk, Deferring XMPP connections to avoid thrashing during login/logout scenarios. Social features recovering quickly from ELB or other connectivity issues.  When 3.4 million clients are connected at the same time these inefficiencies add up quickly.
  • Optimize how we store the matchmaking session data in our DB. Even without a root cause for the current write queue issue we can improve performance by changing how we store this ephemeral data. We’re prototyping in-memory database solutions that may be more suited to this use case, and looking at how we can restructure our current data in order to make it properly shardable.
  • Improve our internal operation excellence focus in our production and development process. This includes building new tools to compare API call patterns between builds, setting up focused weekly reviews of performance, expanding our monitoring and alerting systems, and continually improving our post-mortem processes.
  • Improve our alerting and monitoring of known cloud provider limits, and subnet IP utilization.
  • Reducing blast radius during incidents.  A number of our core services are globally impacting to all players.  While we operate game servers all over the world, expanding to additional cloud providers and supporting core services in multiple geographical locations will help reduce player impact when services fail.  Expanding our footprint also increases our operational overhead and complexity.  If you have experience in running large worldwide multi cloud services and/or infrastructure we would love to hear from you.
  • Rearchitecting our core messaging stack.  Our stack wasn’t architected to handle this scale and we need to look at larger changes in our architecture to support our growth.
  • Digging deeper into our data and DB storage.  We hit new and interesting limits as our services grow and our data sets and usage patterns grow larger and larger every day.  We’re looking for experienced DBAs to join our team and help us solve some of the scaling bottlenecks we run into as our games grow.
  • Scaling our internal infrastructure. When our game services grow in size so do our internal monitoring, metrics, and logging along with other internal needs.  As our footprint expands our needs for more advanced deployment, configuration tooling and infrastructure also increases.  If you have experience scaling and improving internal systems and are interested in what is going on here at Epic, let’s have a chat.
  • Performance at scale.  Along with a number of things mentioned, even small performance changes over N nodes collectively make large impacts for our services and player experience.  If you have experience with large scale performance tuning and want to come make improvements that directly impact players please reach out to us.
  • MCP Re-architecture
    • Move specific functionality out of MCP to microservices
    • Event sourcing data models for user data
    • Actor based modeling of user sessions

The Fortnite team ended their massive blog post with the following message:

Problems that affect service availability are our primary focus above all else right now. We want you all to know we take these outages very seriously, conducting in-depth post-mortems on each incident to identify the root cause and decide on the best plan of action.   The online team has been working diligently over the past month to keep up with the demand created by the rapid week-over-week growth of our user base.

While we cannot promise there won’t be future outages as our services reach new peaks, we hope to live by this great quote from Futurama, “When you do things right, people won’t be sure you’ve done anything at all.””

Fortnite is now available for PC, PlayStation 4, and Xbox One.

So, thoughts on the recent Fortnite server issues? Did you or any of your friends notice any issues while playing the game last weekend? Let us know in the comments section below, and as always, stay tuned to Don’t Feed the Gamers for all the latest gaming and entertainment news! Don’t forget to follow DFTG on Twitter for our 24/7 news feed!

Ryan "Cinna" Carrier3025 Posts

Ryan is the Lead Editor for Don't Feed the Gamers. When he isn't writing, Ryan is likely considering yet another playthrough of Final Fantasy IX. He's also the DFTG cinnamon bun.

Login

Welcome! Login in to your account

Remember me Lost your password?

Lost Password