Welcome back, dear reader! We mentioned in last week’s review that we had a few additional notes about the meta to talk about this week so we’re back with those notes and a few explanations and write-ups about our stats.
Thanks to the wonderful efforts of tournament organizers and app developers around the world, we have access to what is essentially every meaningful piece of data around competitive games of 40k. The data in this month’s study comes from:
- Best Coast Pairings/Down Under Pairings apps, the premier way to run, manage, and track results for tournaments
- The ITC Battles App, a brilliant app for tracking games both in and out of tournaments and a great source of casual game data worldwide
To start things off we’re going to be talking a bit about Glicko scores. We’ve used the ELO-like Glicko scores a few times in recent meta reviews, and we’ve gotten some questions about what they are, how they’re calculated, and what they mean. So today we’re kicking things off by talking about the scores and addressing those questions.
Understanding Glicko Scores
Measuring the strength of the factions in a game like Warhammer 40,000 is a difficult problem. There are many different factions, being played worldwide by many different players, and the matchups between them and other factions are uneven. With as many moving parts as exist across all these armies, how do you measure the relative strength of a faction? Raw win rates give us one measure, but don’t really account for factions that may have high win rates against weaker factions but lose to the game’s top factions – this is something we’ve seen with Chaos Knights, which typically finish with high win rates at events but do so after losing their first or second game, meaning they’re often cleaning up against opponents with no chance of winning the event. The Falcon’s Tournaments in Winning Position (TiWP) addresses this directly by looking at how often factions are in a position to win an event, and comparing that to the percentage of the field they represent. But this too has limitations – it’s dependent on tournament data, and can be affected heavily by luck or player choices around armies. These are both good measures, but we were looking for another way to gauge faction strength.
Rather than try and develop this from scratch, I decided to see if a chess rating system could be used to answer this question. There are multiple approaches that could have been used – Elo is the most famous of these – but I selected the Glicko-2 rating system. At its most basic level, the Glicko score is a rating of player skill that rewards playing against better opponents; when you beat a player with a much higher rating than you, your score increases by more than if you beat someone with an even or lower rating. Likewise, losing to someone with a lower rating is more detrimental to your score. In addition, Glicko 2 adds both a rating deviation (RD) and a rating volatility measure, which can be used to help describe the amount of uncertainty around a player’s rating.
Because we aren’t super interested in gauging an individual’s performance (and because we don’t have that in our data), we can instead treat the faction like a player, and scored accordingly. Each faction is treated as a chess player who plays a large number of games per day, against other factions acting as players. This means that faction Glicko scores will increase by greater amounts when a faction’s players score more wins against players of other highly rated factions, and score fewer points for beating up on the game’s weaker factions.
To begin the analysis, each army starts with a rating of 1500 and an RD of 350. This can be interpreted as saying that you are roughly 95% sure that the true rating of the army is between 800 and 2200. Without any data, this confidence interval is quite wide. However, as you gain observations it quickly shrinks. Glicko-2 scores are then updated for each day in the sample. Each day you compare the actual win/loss results for a faction to the expected results of each game they played against an opposing faction. You calculate this using the ratings of each army and the RD of the opponent to essentially create a probability of success. This chance is 50% if the opponent is the same faction, but higher or lower depending on the opponent’s rating. You then update each faction’s ratings with these new values and repeat the process the next day. Our model uses all of the ratings and RDs from the prior day and treats every game on a day as though it ended before midnight.
As of the most recent analysis, Drukhari (1617.8, 8.7) have the highest rating. Harlequins (1563.9, 11.7) have the second highest, and Genestealer Cults (1408.7, 16.9) have the lowest rating. Using these ratings, the predicted probability that Drukhari will beat Harlequins is 57.7%, and the probability that Drukhari will beat Genestealer Cults is 76.9%.
As the model iterates across the days observed in the data its score will change based on whether it exceeds or falls short of its projected results. If it wins more games than expected, and against tougher opponents where it’s projected to lose, the faction score will increase. If it loses more than expected and against weaker opponents, the score will drop.
Why should we care about Glicko scores in general?
By Charlie Anderton
Glicko scores have a number of advantages over simply looking at win rates or head-to-head results, and we, as a community, should adopt them for player rankings. Many other prominent, successful games played on a large scale already use a rating system like Glicko, Elo, or some other proprietary equivalent – chess, Magic: the Gathering, Overwatch, and League of Legends all use systems like this for individual player ranks, while USA Today and FiveThirtyEight use them for rating teams in NCAA College football and other sports. A Glicko-like rating system provides a number of benefits:
- It provides quantifiable, non-arbitrary standings
- It rewards players for taking on more difficult opponents
- It can be used to make intelligent decisions about game balance
- It can be used in both competitive and non-competitive formats
- It can enrich the player experience
- It can be easily applied to one-off games or formats outside of GTs
Rob: That last point is particularly interesting – the current ITC standings incentivize you for playing larger events and winning multiple games in a row, regardless of the level of competition. This often leads to UK players calling US players “soft” because of their comparative spread distributions – it’s more common for a US event to have 2-3 top players against a weaker field than in the UK, where a dozen or more players may have legitimate claim on an event. Glicko/Elo ratings solve this by making someone’s scores dependent on who they play, and not when. Theoretically Glicko rating systems don’t even need organized events, just a way to sanction individual matches. But because Glicko/Elo ratings are “zero-sum” within a community, they also encourage players to branch out and play a larger pool of players – you can’t reach grandmaster status by only beating up on the same 3-4 players in your play group.
There is no legitimate reason not to adopt this into the Warhammer community we all love and care about. Doing so would improve the health of, growth, and longevity of the game.
One reason that the data that Dexefiend, TheChirurgeon, and the Falcon work so tirelessly to present here and other places is valuable because it’s not clouded by bias and anecdotes. The data represents where the state of the game exists right now in ways not possible via other means. As it turns out, we aren’t all Drukhari-haters (yet) – it’s just that they’re that good. Odds are you’re not going to beat Dark Eldar more often than you lose to them, no matter who you are. That’s a problem for both new players and experienced players alike, and clearly demonstrable via this sort of analysis. This also makes it valuable data for examining the health of the meta and making game balance decisions. I won’t spoil the Falcon’s big Dark Technomancer reveal below, but it’s a perfect example of how this analysis should be used to balance the game.
If you’re invested enough in Warhammer 40k to be reading this, a ranking system like this could massively enrich your experience of the game. From using it for tracking your personal progress year to year – both in competitive play and against friends – to using scores as qualifiers for Championship-style or Invitational events, rating systems can be a useful tool for community engagement. Even for newer players or people who we would not typically associate with typical competition, ranking systems like this can help by seeding the first round of an event to ensure Mr Newb doesn’t play a previous LVO champion and get blown out. It could also be used to match newer players in pools for local casual leagues, too. And it can help create incentives for players to play against other higher-ranked players, giving them cause to seek out tough matchups rather than shy away from them.
If we don’t adopt a system like this across the entire community, we’re missing out on a huge opportunity.
By Peter “The Falcon” Colosimo
Since the Drukhari release there has been an awful lot of talk, here and elsewhere, about what if anything needs to be addressed in the codex to relieve some of the eye-watering numbers that their codex has been putting out over the last 6 weeks with a particular eye on targets like Cult of Strife and Dark Technomancers, the former for its access to incredible relics, combos and stratagems, the latter for its ability to delete elite units at effectively no cost due to how it interacts with the liquifier.
In tracking all of the competitive games that have occurred, it has been interesting to deep dive into exactly what is and isn’t working for Drukhari because, despite having that much ballyhoo’ed 70%+ win rate, there have indeed been lists that haven’t lived up to the hype. While at this point it has become common to see Drukhari in the top 4s of any GT+ event, there are still some Drukhari lists that fall below .500 and all of those sub-500 lists have had something in common.
Since the first events played out that allowed the use of the new codex on April 10th, there have been 56 players that have brought Drukhari out to play as their primary faction at GTs and Majors, playing a total of 324 recorded games. Of those 56 lists, 25 have run Dark Technomancers and not a single one of them has had a record worse than 3-2. In fact, lists that include Dark Technomancers are currently 119-24-1 in tournament play boasting an 83% win rate, 86.5% when you account for mirror matches. If you remove Dark Technomancer results from the current Drukhari win rate they go from a 70.2% win rate to 60.2% off of that change alone.
This isn’t all to say that a ‘fix’ for Dark Technomancers would be a total ‘fix’ for Drukhari; firstly because even a 60% win rate is well above a healthy number (though far more digestible than 70) and secondly because it does not account for those players currently running Dark Technomancers just moving to something else in this deep codex. What it does show is that this particular added ability is a bit of a problem, though perhaps a bit is in itself a bit of an understatement…
Which Marines Chapter is the best?
By Robert “TheChirurgeon” Jones
While the space marine armies mostly draw from the same book and units now, there are still some major differences between the supplements that can make chapters stronger or weaker. Looking at win rates gives us an idea of how marine factions stand up in the larger scheme of things since the release of Codex: Dark Angels.
Black Templars, Iron Hands, Dark Angels, and Space Wolves top the marine factions these days, though the relative strength difference between Iron Hands and Raven Guard isn’t particularly large. Marines are firmly in the middle of the pack, with most of their options sitting between 55% and 45% win rates. For the most part, these would be pretty healthy win rates in a meta without a Drukhari-like faction posting 60%+ win rates.
In addition to marines, we can look at a few other subfactions, though sample sizes quickly drop off as we talk about less popular factions. Necrons and Death Guard have been particularly popular since their new codex releases and so provide some of the most fertile data for analysis.
For Death Guard the Ferrymen, Mortarion’s Anvil, and the Harbingers appear to be enjoying the highest win rates, though these vary pretty significantly from observed tournament results from BCP, where Death Guard armies featuring The Inexorable have had the highest win rates among Death Guard armies. Win rates for the Inexorable in our ITC Battles data have been increasing steadily over the last month, so this is a trend we expect to see mirrored in the larger dataset.
Although we can’t account for custom dynasty codes that include Relentlessly Expansionist, we can see that Novokh is likely the most favorable dynasty to play otherwise, owing to a high win rate (54%) and a large enough sample size. Nihilakh also fares surprisingly well, but is at the kind of sample size where a few strong players could be skewing things.
Monofaction Chaos Space Marines armies are relatively rare in a competitive setting these days; the faction tends to get much more traction out of running with Daemons and Daemon Princes or Knights. This is particularly true for Emperor’s Children Noise Marines, and that likely explains the low performance of mono-EC armies in the dataset, though the small sample size is also a likely culprit. From a monofaction standpoint, World Eaters appear to be the most successful legion.
For Thousand Sons there’s a pretty big divide between the Cult of Magic and everyone else.
And here’s a few other subfaction splits that had just enough data to look at. I’m not going to cover them all here, but it felt worth adding Orders for Sisters of Battle and Hive Fleets for Tyranids. Though note that these sample sizes are relatively small and that these armies may be misrepresented as players may often mix different orders and hive fleets. There may also be an issue with under-reporting or player skill here, which can be why Order of the Bloody Rose rated lower than expected. So take these with a bit of salt – consider them to be directional or anecdotal for the time being, but interesting to mull over.
We’ve covered Secondary Objectives a few times in these columns, and it’s worth taking another look at the stats behind these for an update. There’s been a bit of discussion about whether faction secondaries are too powerful and whether they should be allowed in competitive play. Over the last three months we’ve seen three new codexes release and two of those have contained some of the most powerful faction secondaries we’ve seen yet. When we look at the overall spread of average points between faction secondaries and the secondaries offered in the 2020 Grant Tournament Missions pack, we can see that faction secondaries average higher scores by almost one and a half points.
There are a few things to note here: First, 1.5 points isn’t a ton, given that the average margin of victory for games in the ITC Battle app is 25 points since the January 2021 FAQ. In GT events recorded in BCP the average margin of victory is just under 31 points. Second, players rarely take more than one faction secondary even when able, with the possible exception of Dark Angels, who we’ll talk about in a moment, making this advantage unlikely to compound. Third, the Mission secondaries also average about a half point more value than the standard GT secondaries. And fourth, these Codex secondaries are far from equal. Looking at the top secondaries by average VP gives us the following chart:
Priority Targets as a mission secondary is still the reigning MVP of free points, but the Drukhari objective Herd the Prey and the Dark Angels secondaries Stubborn Defiance and Death on the Wind have joined Oaths of Moment from Codex: Space Marines. Dark Angels potentially have access to all three, though it’s worth noting that the only combination of these where they can take two is Stubborn Defiance and Oaths as otherwise either categories or both being from the same book blocks them. On the whole these seem to indicate a cause for concern. However as we’ve seen recently, Neither Dark Angels nor Space Marines have demonstrated superior win rates up to this point despite having access to superior secondary objectives. And before you rush to point to Herd the Prey as a potential indicator of Drukhari dominance, you should know that Drukhari have no problem scoring Engage on All Fronts if it’s not available – they score an average of just under 10 VP on that secondary, which gives opposing players less agency – something Skari recently addressed when talking Drukhari tactics. This is on par with the average points scored for Engage by Aeldari, Craftworlds, and Harlequins armies, so it’s unlikely to be the source of all the Drukhari’s power when it comes to meta-warping dominance (it certainly doesn’t hurt their cause, however). Additionally, many of the top factions – Harlequins, Adepta Sororitas, Orks, Custodes, and Chaos soup lists – do not have faction secondaries and have remained among the game’s top factions.
So what we’re left with is a small set of codex secondaries that appear to be stronger than those offered in the GT missions pack, for which the actual impact of these is likely to be relatively small, though in the case of Dark Angels and Drukhari, likely still significant. But this is where I’d argue that the problem is not so much the secondary objectives put out in Codexes, but rather the GT secondaries, many of which are just bad. And notably, two key secondaries were actively made worse in the January FAQ- the average value of Bring it Down dropped by nearly a point, while Abhor the Witch dropped significantly as well. While both of these changes were good in the larger scheme of things, they’ve weakened an already comparatively weak group of secondaries, which includes many options that average fewer than 7 points per game.
Because of this I’d suggest that the solution to secondary imbalance with Codexes is not to remove or disallow codex secondaries but rather to make the GT secondaries better. But I’ll leave the recommendations on how to do this up to a future rant by Wings.
Go-First Win Rate
In March we examined the potential impacts of changes to the first-turn roll-off and final turn scoring introduced in the January FAQ. We noted a small decrease in first turn win rates from the pre-FAQ period but noted that we’d still need more data to make any final determinations. With nearly as many games post-FAQ as prior, we now have pretty definitive results.
The January FAQ seems to have affected a slight – but statistically significant – decrease in go first win rates overall – from 57% to 55%. This is of course still higher than the late 8th edition go-first win rate and much higher than we’d like to see, where preferably go-first win rates would fall more into the 48-52% range.
By mission, there have been some observably significant decreases for Scorched Earth, Vital Intelligence, and Surround and Destroy. While some missions seem to have slightly better go-first win rates than others, None in particular stand out at the moment as particularly good or bad comparatively.
This is also the same result observed at GT-level tournaments since the FAQ changes in January, where data from Best Coast Pairings for events with 5+ rounds and 28+ players shows a 55% win rate for players with the first turn. And as with prior examinations of the data, we continue to see win rates for the player with the first turn increase in later rounds where record matching causes players to be paired by skill level.
Next Month: AdMech, Probably
That does it for this month. We’ll see you again in 4 to 6 weeks or so, in time to talk about the impact of Codex: Adeptus Mechanicus. Until then, if you have any questions or feedback, drop them in the comments below or email us at email@example.com.