Skip to content

League Rating System

A comprehensive rating system for tracking player skill across tournaments and leagues within DraftForge.

Overview

The league rating system tracks player performance with two parallel rating methods, handles significant MMR changes through epochs, and aggregates data across organizations for reliable skill assessment.


Features

Dual Rating System

The system tracks two parallel ratings for each player:

Rating Type Purpose Best For
Elo-style Simple, intuitive number anchored to Dota 2 MMR Display, quick comparisons
Glicko-2 Statistical rating with uncertainty tracking Accurate matchmaking, reliability assessment

Why both? Elo is familiar to players and anchors to their real MMR. Glicko-2 provides mathematical confidence in ratings, especially important for players with few games.


MMR Epoch System

Problem: Player joins at 1000 MMR, earns +200 rating over 10 games. A year later their Dota MMR is 5000. Should +200 points from playing 1000 MMR opponents count toward their 5000 MMR rating?

Solution: When a player's verified MMR changes significantly (default: 1000+ difference), a new "epoch" begins:

  • Previous rating data is archived (not deleted)
  • Fresh rating starts from new MMR baseline
  • Historical data remains accessible for reference

This ensures ratings reflect current skill level, not accumulated points from outdated games.


Flexible K-Factor Modes

K-factor determines how much each match affects a player's rating. Four modes available:

Mode Behavior Use Case
Fixed Same K for everyone Simple, predictable leagues
Placement Higher K for first N games New player calibration
Percentile Higher K for bottom players, lower for top Encourages climb, protects top ranks
Hybrid Placement first, then percentile Most balanced (recommended default)

This allows leagues to tune rating volatility based on their format and goals.


Organization-Level Rating Aggregation

Problem: Organizations run many small leagues (weekly 8-player tournaments). Individual leagues have too few games for reliable ratings.

Solution: Aggregate ratings across all leagues in an organization:

  • Combines data from multiple leagues
  • Weighted by games played and recency
  • Provides organization-wide leaderboard
  • Shows reliability indicator (10+ games OR 3+ games across 3+ leagues)

Aggregation Methods:

  • Weighted Average - Leagues weighted by games and recency
  • Bayesian Combination - Statistical combination using confidence
  • Most Recent - Use latest league's rating, sum stats

Age Decay for Matches

Problem: Matches from 2 years ago shouldn't count as much as recent matches.

Solution: Configurable half-life decay:

  • Older matches contribute less to rating changes
  • Configurable half-life (default: 180 days = 50% weight)
  • Minimum floor (default: 10% - matches never become worthless)
  • Can be disabled entirely per league

Rating Deviation (Uncertainty Tracking)

Problem: A player with 3 games and a player with 100 games may have the same rating, but we're much more confident in the 100-game player.

Solution: Glicko-2's Rating Deviation (RD):

  • High RD = high uncertainty, larger rating swings
  • Low RD = high confidence, smaller swings
  • RD increases with inactivity (uncertainty grows without data)
  • Displayed as confidence interval in UI

Configurable Display Options

Leagues can choose which rating to show:

  • Total Elo - base_mmr + positive - negative
  • Net Change - positive - negative (hides base MMR)
  • Glicko Rating - Statistical rating without MMR anchor

This accommodates different league cultures and player preferences.


Design Decisions

Separate Configuration Model

Rating config lives in a dedicated model, not embedded in League:

  • Separation of concerns
  • Leagues without ratings don't carry config baggage
  • Enables future config versioning
  • All defaults defined in ENUMs, not magic numbers

Pluggable Rating Algorithms

Rating calculation uses strategy pattern:

  • EloRatingSystem - Standard Elo with percentile K-factors
  • FixedDeltaRatingSystem - Flat points per win/loss
  • Glicko2RatingSystem - Full Glicko-2 implementation

New algorithms can be added without modifying existing code.

Match Participant Snapshots

Every match records:

  • MMR at match time
  • Rating before/after
  • K-factor and parameters used
  • Age decay applied

This enables:

  • Auditing rating changes
  • Recalculation with constraints
  • Historical analysis

Problems and Solutions

Problem Solution
Outdated MMR distorts ratings MMR Epoch system resets on significant changes
New players swing too much/too little Placement game K-factor mode
Top players lose too much to lower ranks Percentile-based K-factor scaling
Small leagues lack data for reliable ratings Organization-level aggregation
Old matches overweighted Age decay with configurable half-life
Uncertainty not visible Rating Deviation display and confidence intervals
Different leagues need different rules Fully configurable per-league settings
Rating changes hard to audit Comprehensive match participant snapshots

Metrics Tracked

Per Player Per League

  • Base MMR (Dota 2 anchor)
  • Positive/Negative stat accumulation
  • Glicko-2 rating, RD, volatility
  • Games/Wins/Losses
  • Current epoch number
  • Last played timestamp

Per Organization Per Player

  • Aggregated rating across all org leagues
  • Total games/wins/losses across org
  • Leagues participated count
  • Reliability indicator

Per Match Per Participant

  • Rating before/after
  • K-factor used
  • Age decay applied
  • RD at calculation time
  • Win/loss result and delta

Configuration Defaults

Setting Default Description
K-factor (default) 32 Standard rating swing
K-factor (placement) 64 Higher volatility for new players
K-factor (bottom 5%) 40 Faster climb for low-rated players
K-factor (top 5%) 16 More stability for top players
Placement games 10 Games before standard K applies
Min games for ranking 3 Required to appear on leaderboard
Age decay half-life 180 days When matches count 50%
Glicko initial RD 350 Starting uncertainty
MMR epoch threshold 1000 MMR change triggering reset

All defaults stored in RatingDefaults enum - no magic numbers in code.