League Rating System¶

A comprehensive rating system for tracking player skill across tournaments and leagues within DraftForge.

Overview¶

The league rating system tracks player performance with two parallel rating methods, handles significant MMR changes through epochs, and aggregates data across organizations for reliable skill assessment.

Features¶

Dual Rating System¶

The system tracks two parallel ratings for each player:

Rating Type	Purpose	Best For
Elo-style	Simple, intuitive number anchored to Dota 2 MMR	Display, quick comparisons
Glicko-2	Statistical rating with uncertainty tracking	Accurate matchmaking, reliability assessment

Why both? Elo is familiar to players and anchors to their real MMR. Glicko-2 provides mathematical confidence in ratings, especially important for players with few games.

MMR Epoch System¶

Problem: Player joins at 1000 MMR, earns +200 rating over 10 games. A year later their Dota MMR is 5000. Should +200 points from playing 1000 MMR opponents count toward their 5000 MMR rating?

Solution: When a player's verified MMR changes significantly (default: 1000+ difference), a new "epoch" begins:

Previous rating data is archived (not deleted)
Fresh rating starts from new MMR baseline
Historical data remains accessible for reference

This ensures ratings reflect current skill level, not accumulated points from outdated games.

Flexible K-Factor Modes¶

K-factor determines how much each match affects a player's rating. Four modes available:

Mode	Behavior	Use Case
Fixed	Same K for everyone	Simple, predictable leagues
Placement	Higher K for first N games	New player calibration
Percentile	Higher K for bottom players, lower for top	Encourages climb, protects top ranks
Hybrid	Placement first, then percentile	Most balanced (recommended default)

This allows leagues to tune rating volatility based on their format and goals.

Organization-Level Rating Aggregation¶

Problem: Organizations run many small leagues (weekly 8-player tournaments). Individual leagues have too few games for reliable ratings.

Solution: Aggregate ratings across all leagues in an organization:

Combines data from multiple leagues
Weighted by games played and recency
Provides organization-wide leaderboard
Shows reliability indicator (10+ games OR 3+ games across 3+ leagues)

Aggregation Methods:

Weighted Average - Leagues weighted by games and recency
Bayesian Combination - Statistical combination using confidence
Most Recent - Use latest league's rating, sum stats

Age Decay for Matches¶

Problem: Matches from 2 years ago shouldn't count as much as recent matches.

Solution: Configurable half-life decay:

Older matches contribute less to rating changes
Configurable half-life (default: 180 days = 50% weight)
Minimum floor (default: 10% - matches never become worthless)
Can be disabled entirely per league

Rating Deviation (Uncertainty Tracking)¶

Problem: A player with 3 games and a player with 100 games may have the same rating, but we're much more confident in the 100-game player.

Solution: Glicko-2's Rating Deviation (RD):

High RD = high uncertainty, larger rating swings
Low RD = high confidence, smaller swings
RD increases with inactivity (uncertainty grows without data)
Displayed as confidence interval in UI

Configurable Display Options¶

Leagues can choose which rating to show:

Total Elo - base_mmr + positive - negative
Net Change - positive - negative (hides base MMR)
Glicko Rating - Statistical rating without MMR anchor

This accommodates different league cultures and player preferences.

Design Decisions¶

Separate Configuration Model¶

Rating config lives in a dedicated model, not embedded in League:

Separation of concerns
Leagues without ratings don't carry config baggage
Enables future config versioning
All defaults defined in ENUMs, not magic numbers

Pluggable Rating Algorithms¶

Rating calculation uses strategy pattern:

EloRatingSystem - Standard Elo with percentile K-factors
FixedDeltaRatingSystem - Flat points per win/loss
Glicko2RatingSystem - Full Glicko-2 implementation

New algorithms can be added without modifying existing code.

Match Participant Snapshots¶

Every match records:

MMR at match time
Rating before/after
K-factor and parameters used
Age decay applied

This enables:

Auditing rating changes
Recalculation with constraints
Historical analysis

Problems and Solutions¶

Problem	Solution
Outdated MMR distorts ratings	MMR Epoch system resets on significant changes
New players swing too much/too little	Placement game K-factor mode
Top players lose too much to lower ranks	Percentile-based K-factor scaling
Small leagues lack data for reliable ratings	Organization-level aggregation
Old matches overweighted	Age decay with configurable half-life
Uncertainty not visible	Rating Deviation display and confidence intervals
Different leagues need different rules	Fully configurable per-league settings
Rating changes hard to audit	Comprehensive match participant snapshots

Metrics Tracked¶

Per Player Per League¶

Base MMR (Dota 2 anchor)
Positive/Negative stat accumulation
Glicko-2 rating, RD, volatility
Games/Wins/Losses
Current epoch number
Last played timestamp

Per Organization Per Player¶

Aggregated rating across all org leagues
Total games/wins/losses across org
Leagues participated count
Reliability indicator

Per Match Per Participant¶

Rating before/after
K-factor used
Age decay applied
RD at calculation time
Win/loss result and delta

Configuration Defaults¶

Setting	Default	Description
K-factor (default)	32	Standard rating swing
K-factor (placement)	64	Higher volatility for new players
K-factor (bottom 5%)	40	Faster climb for low-rated players
K-factor (top 5%)	16	More stability for top players
Placement games	10	Games before standard K applies
Min games for ranking	3	Required to appear on leaderboard
Age decay half-life	180 days	When matches count 50%
Glicko initial RD	350	Starting uncertainty
MMR epoch threshold	1000	MMR change triggering reset

All defaults stored in RatingDefaults enum - no magic numbers in code.