343: Explanation of What Happened with the MCC & Why It Was Left Unpatched Until Now - NeoGAF
This has popped up in a couple places such as Reddit and the Halo 5 OT from Frank O'Connor as the long promised explanation of why the MCC turned out the way it did. It's a long read but goes into detail on why things were as broken as they were and why things weren't eventually patched at the time.QuoteOriginally Posted by Stinkles
reposting from waypoint: As you saw on the livestream earlier on this week, it was a big relief to be able to share the news that we're fixing and indeed updating MCC. I was traveling this week and out of the office while the livestream was happening, but if you remember, a couple of years ago I promised to explain what the underlying issues with MCC were. Well here it is, sort of. I'm obviously not an engineer, so apologies to the deeply technical for keeping this at a level I understand. I'd also like to be clear, up front about what's not contained here - there's no excuses. We're explaining some of the issues in more depth than we have before, in part because we now have the resources, OS and capability to make meaningful changes. So please don't mistake explanations for excuses - we're trying to be as transparent as possible, but there are loads of proprietary things we can't talk about at a granular level. We will, however, have kind of a second half of this post when the update is released, where we can go through the causes of items that were fixed by the new update, rather than jinx them right now. There's also no guarantees in here, beyond my guarantee that we care about this very much and are throwing our best people and best efforts into this project. On Wednesday of this week, we announced that we're both fixing MCC and working on enhancements for the Scorpio (Xbox One X) version of the game, but I should be clear here, that in terms of chicken/egg scenarios, fixing the existing "vanilla" Xbox One MCC was the Chicken that laid the Xbox One X enhanced version egg. Without the ability and opportunity to reconfigure and fix this thing, we wouldn't touch an Xbox One update. But a series of changes to the Xbox architecture, some of them related to Xbox One X - and others just a series of ongoing improvements to the OS and back end networking systems, have cracked open an opportunity we've wanted to seize for many, many months now. So to be super clear, these fixes will apply to both regular Xbox One version and the Xbox One X enhancements. We're also getting a lot of help from the (wholly separate) Halo 5 team, who created a much more robust system for the launch of that game and continue to make improvements to their networking model. From a personal perspective, the MCC launch was one of my lowest ebbs, professionally. Every angry mail I received, I took to heart. I felt like I had personally let our fans down. I have not spent a single day since the night the game fell down in matchmaking where I didn't think about it. The hardest messages to deal with were the ones driven by disbelief. "How could you not know that matchmaking was going to break?" - fundamentally it was because we were testing it in an environment that we had set up incorrectly and with some (as we discovered later) faulty assumptions. And unlike some of our other normal testing cycles, we weren't testing for gameplay balance and stuff that the original releases already contained so our test process was radically different, and we made mistakes in some of the scenarios we asked for. We had, with the best intentions, created a massive and ambitious project that almost read like a Halo fan's wishlist. As a player, I was incredibly excited. And as an employee, I was proud of the work and effort the team had poured into making this thing so big. It initially started as a conversation about making a Halo Anniversary 2 - we thought about simply replicating what we'd done with the first Halo: Combat Evolved Anniversary, a polished update with some cool new graphics and features. But we kept talking about it - and the conversation inevitably led to the "problem" of a franchise existing over multiple generations of hardware. This was built for Xbox One - and prior games were spread across 360 and OG Xbox. So we figured, why not finally put the whole Chief saga on one console? We wanted everyone to be able to enjoy his entire story. And so the project ballooned in scope and scale and ambition. We threw a ton of resources behind it internally and worked with some trusted partners. In our matchmaking testing we were seeing results that ultimately weren't reflective of the real retail environment, and our test sessions never got to the kind of scale where we'd see some of the looping issues I'll describe below. So we genuinely didn't know until the day it released, how bad the matchmaking in particular was going to get. I'm not going to ignore the other bugs, they were real, and important, but the way the UI and matchmaking protocols interacted with each other exacerbated many of the smaller items and amplified a couple of them in unpredictable ways. The short version was that for Xbox One we built some of the underlying systems to work on a brand-new platform, which was fundamentally, quite different to both the original consoles the games were designed for. We also had some very new (and frankly these have evolved since then and are now much better) online systems on a new console and made some educated, but (with hindsight) ultimately faulty, assumptions we made during development and testing. To be clear here, the platform networking model was working as intended, but we made errors and ultimately approached it with the wrong strategy. Frankly, we don't assume anything anymore. While we had some valid reasons to believe the game would function properly in the retail environment, we've shifted our development philosophy to basically assume nothing anymore. And one way we're going to avoid that in future is through a retail flighting program - testing the game fixes in a real-world environment with real players, including many of you. Naturally we'll also be doing much more rigid conventional testing with the benefit of both hindsight and new, better systems. One of the main matchmaking issues was related to the way that the games gathered players - each title had some differences in how it sought out players, then connected them into sessions. In an attempt to unify that method, we actually introduced a bunch of (with the benefit of hindsight) several avoidable problems and some unavoidable ones. It gets really technical, and this is as much metaphor as technical explanation, but each potential player was assigned a kind of "ticket" which would then grant them entry into a match or session - picture a virtual waiting room at a train station - when the train arrives (a match) - everyone has to board - or the train can't leave. Issues arose when folks left sessions before games had started that would cause the initial ticket distributions to fail, and that sometimes meant very long wait times for matches as tickets were issued and reissued - especially in countries with lower populations. Now the above isn't particularly unusual or original in terms of approach, but at the time the systems were less resilient in terms of churn, and bad information could cause a lobby or match to get caught in a state where it couldn't ultimately complete a group and join them cleanly into a session. At the time we made tons of changes to the backend server configurations to try and reduce those wait times, but ultimately it was a self-fulfilling prophecy - players understandably would leave sessions because they got tired of waiting for a match to begin, and that would amplify the issue across the board. But there were other issues that compounded the noise and frustration players felt. For example, there's a good-sized subset of our population that has issues with Teredo, IPSEC and NAT compatibility that we simply can't troubleshoot or identify, and some of those users are encountering issues that are literally beyond our control - trapped behind corporate or academic firewalls. These feed into some of the areas we're planning to improve, and ongoing improvements to the Xbox systems have improved some of those issues, but not all. We've loosely explained this over the last couple of years, but I'd like to reiterate here. We ended up in a situation where the game was working for the vast majority of users. That's not the same as perfect, or for some, even acceptable and that's not what I'm trying to claim. Many people who complain about the game these days have legitimate issues with matchmaking and other aspects, and we're not going to dismiss those - so even though for most, the game is stable, and the sheer wealth of content and experiences makes it - to this day - a highly rated title by players - we get it. If you're one of the people affected, seeing a statement from us that it's working for "the vast majority of players" is cold comfort. When you're the one affected, it's as good as 100%. The fixes and patches we'd applied were pretty delicate and we ended up in a precarious situation where there was no way to make more fixes without potentially breaking something else or making things worse. We weren't happy with that situation, but we were stuck between a rock and a hard place - most users were (by this time) able to play properly and find matches, and further tinkering might put that at risk. At that time we decided the right thing to do for the total player base was to stop. That was hard to do, especially knowing there were still some customers impacted more seriously than players who were merely inconvenienced. But that didn't stop us being concerned about it anymore. On the contrary, in some ways leaving it was worse. I mention this not to garner sympathy, we deserve none, but to answer folks who've continued to ask, "Why don't you guys care?" We do. Everyone here puts their heart and soul and sweat and tears into building our games. I can tell you without hesitation that I have never heard someone here dismiss or ignore or belittle complaints. We always take them to heart. It's the internet of course, so sometimes folks take it too far, with threats or other inappropriate reactions, but I'd be lying if I said I didn't understand the anger or disappointment those came from. So over the months we discussed and investigated other fixes. The platform itself has made some truly evolutionary improvements to its underlying technology, and recent fundamental changes mean that we might have the opportunity to make some fixes without risking everything else. It may sound simplistic, but MCC was essentially six pretty different game engines strapped together and interlinked with highly complex and highly delicate new systems. With Xbox One X on the horizon, it was obvious that we could simultaneously update the game to take advantage of the new hardware for folks that have it and use that as an opportunity to finally rearchitect and update some of the foundational issues and networking/matchmaking methods. And to be clear, these solutions were simply not possible until quite recently. The platform team has made numerous improvements over the last year or so, and we've internally done a bunch of research, and so our timing has been reliant on a number of systems and solutions converging rather than one single element. But these weren't easy fixes we were simply sitting on. That's honestly not a thing, even. I also understand that silence can be frustrating. You have complaints or questions, and we try to answer them as best we can, but sometimes bad information is worse. As I said at the start of this explanation, it doesn't answer all your questions. I'm going to follow up next year after we have better detail on the fixes and the Xbox One X update, to follow through with an even more detailed technical breakdown of what broke, why and how we fixed it. That's what we owe you - that and a game we can both finally be satisfied with.