Start Competing: OGP and You!

I had previously written about the various Pairing and Placing systems that the competitive Warhammer community might use to manager their events, however, the fellows over at Best Coast Pairings recently announced the release of a new metric for tournament organizers.  That metric, Opponent’s Game Win Percentage (OGP) is a carry over from the Magic: The Gathering community and is an essential improvement on the Strength of Schedule (SoS) metric. I previously discussed SoS as challenging due to the issue of player drops over the course of an event and it’s negative impact on the measurement, however OGP bypasses that altogether. I had the opportunity to speak with both Paul McKelvey and Josh Diffey on the new metric and today we’ll explain what it measures, what drawbacks it has, and provide our recommendations on how Tournament Organizers might handle their pairing/placing schemes with it.

What does OGP Measure and How Does it Work?

All metrics or measurements are a way to quantify some dimension which then enables a comparison between those determined values. Like the Strength of Schedule metric, OGP attempts to measure the relative strength of the field a player has faced throughout an event (and thereby your strength of wins or losses) and then compares that to the rest of that same field for a given moment in time. That is to say, “How strong have your opponent’s been collectively to this point?” And then asking, “How does that compare to other players?”

Fundamentally strength of schedule is derivative of a different measure that is used to assess an opponent’s quality, commonly this is based on a simple W/L measure. OGP uses this as its basic assumption as well. OGP takes all of your opponent’s W/L records as a percentage, then averages them together to arrive at a Strength of Schedule figure. However, owing in large part to the experiences of the Magic: The Gathering community, a floor of 33% is placed on any individual contributing factor. The reason for this is the same problem that plagues the standard SoS metric in BCP – player drops. Specifically in the competitive Magic community, teammates would on occasion weaponize the SoS and drop from an event after an early loss and thereby ‘tank’ their non-teammate opponent’s score. Instead of continuing to play following a loss, they might throw their following games or drop entirely meaning that a player who might otherwise have won 50% or more of their games would net out as a 0%. Underhanded behavior to be certain, but also hard to prove maliciousness. As a result, a 33% floor was instituted to disincentivize such a practice and mitigate against its overall impact.

Perhaps the best way to illustrate this is with an example. Let’s assume two players, A and B, each play 5 unique opponents over the course of an event – both player A and B go 4-1 over the course of the event. Player A plays opponent’s whose final records are as follows:

• Round One Opponent: 5-0
• Round Two Opponent: 1-3
• Round Three Opponent: 2-3
• Round Four Opponent: 3-2
• Round Five Opponent: 4-1

Player A’s opponents have a total average win rate of: 61% (100% + 25% + 40% + 60% + 80% divided by five opponents). However, as I mention above, we do not penalize Player A for their round two opponent having dropped and thus apply the 33% floor to that score. As a result we arrive an OGP of 62.6% (100% + 33% + 40% + 60% + 80% divided by five opponents).

By comparison, player B’s opponent’s final records are as follows:

• Round One Opponent: 2-3
• Round Two Opponent: 3-2
• Round Three Opponent: 4-1
• Round Four Opponent: 4-1
• Round Five Opponent: 3-2

Player B’s opponents have a total average win rate of: 64% (40% + 60% + 80% + 80% + 60% divided by five opponents).

As such we conclude that by OGP Player B had the slightly ‘harder’ road to a 4-1 record as B’s opponents netted 1 additional win vs A’s opponents.

I note that as anyone who has ever witnessed literally any competition of anything ever will tell you, Win and Loss don’t necessarily define the better player or team, it is descriptive only of the outcome. However, anyone who has ever taken a stats course will tell you that with large enough sample sizes binary outcomes can be not only descriptive, but predictive within a context and a level of certainty/error. Herein lies the problem with all 40k metrics – there simply isn’t a large enough sampling in any given single event to create a metric that doesn’t result in large uncertainties in the “true” result of its measurement. As a result, the error in measuring “true” opponent skill only compounds when you derive a further measure from it. However, as a community we just accept this as a given and suspend our disbelief at the event level and I suspect this will continue to be the case until someone invents a better tool or method of measurement that better measures player skill, easily and consistently.

How Should OGP Be Used?

Paul will tell you that the metrics you use to manage your event’s pairing and placings schemes should be based on what you’re seeking to achieve, and he’s right! And this is where I’m going to jump on my soapbox for a moment. I’m on record as despising BP as a tie-breaking metric for many reasons, but most of all it’s completely arbitrary nature as a measurement. Battle Points purports to measure the strength of a win, but in practice fails due to the randomness of the mission, matchup, specific pairing, or any number of problematic issues that are inherent to any single game of 40k. The reality of BP as a metric is that it often doubly-rewards winning and doubly-penalizes losing. Commonly it can penalize the winner as well, as is seen in late round top-table matchups or games against two tightly aligned players who fight out a low-scoring match. Simply put, it’s a bad metric to measure strength of a single game and we know that BP scores often do not tell the true ‘story’ of the game. The injustices and leaps in ranking that results is seen most clearly in any top-table match where a loser plummets – often bypassed by people they’ve already beaten.

That said, there is no perfect system, and while I am personally a big fan of OGP and what it measures it does have some drawbacks. First and foremost, we must acknowledge the inherent compounding error I describe above – while there isn’t a good answer among current 40k metrics, it’s still a limiter to us drawing meaningful conclusions across a single event. Second, and as an extension of the compounding problem, OGP itself needs large(r) sample sizes to drive differentiation in the OGP results. Even at its largest events 40k does not achieve large enough sample sizes to be truly indicative of its underlying measurements – however, we can and should use these numbers as faithful guideposts until a better system exists.

As a result, I think OGP is best used as a tie-break to W/L for the Placing schemes of any 5+ round event for top-cut purposes and/or final placings. With every increased rounds, the result becomes more and more accurate when taken in aggregate across the field. Accordingly, I’m not an advocate for using OGP at 3-round RTTs. I believe the problem of low sample sizes is just too great at a single day event and the impact of any single pairing will be to significant to the results. I whole-heartedly believe that OGP should replace instances where you’d have otherwise used Battle Points in GT+ events. I do not think that OGP should be used for Pairing system tie-breaks as it’s no better than random in early rounds of events and while later on it may become consistent, the changing accuracy of the measurement in later rounds is a bit mercurial for my tastes and I much prefer the hard rule of Win Path pairing as W/L tie-break. Updating the recommendations from a previous article discussing pairing/placing systems, I advocate the following for tournaments seeking to derive a single champion:

Pairing: Win/Loss – Win Path – Random

Placing (5 or more rounds): Win/Loss – Opponent Game Win Percentage – Win Path – Battle Points

Placing (4 or fewer rounds): Win/Loss – Win Path – Battle Points

The reasoning for such a system can be found in my previous article on the topic.

Final Thoughts

One thing I neglected to mention above is the truly great atmosphere that the OGP metric can generate. Unlike other metrics that actively incentivize a winning player to absolutely bury their opponent during the game and forget about it afterwards, OGP instead encourages a sense of community. After playing an opponent, the metric encourages you to follow-up on how they’ve done and actively root them on in their games. I can testify that it makes for a much more satisfying tournament atmosphere and experience and the value of that to an organizer should not be overlooked. We’re all in this together, and OGP supports that.

Overall, a solid SoS metric has been something many tournament organizers and watchers have long sought out of the Best Coast Pairings system – it’s nice to finally have one that is both well established in other game systems (such as Magic) and is immediately applicable to the Warhammer community. While it comes with some limitations, understanding those limitations (and how it compares to other options) is key in determining whether it’s right for your event and your players.