Establishing a Data-Driven Grading System
When players complete a maze run, they recieve a grade based on their time. Originally, these grades were determined by a handful of runs during development. The data collected shows players quickly surprassed these scores, meaning most were considered S++.
We show how we would update the grading criteria based on the completed runs collected so far, based on the distribution of times across completed runs, much like classwork. The completion times are rather compact in the dataset, meaning the grades are sensitive to second increments.
Grade | Time (seconds) | Distribution Cut |
---|---|---|
S++ | < 30.20 | Top Performers (Beyond current best) |
A+ | < 32.22 | 90-100th percentile |
A | ≥ 32.22 and < 32.45 | 80-90th percentile |
B | ≥ 32.45 and < 33.36 | 60-80th percentile |
C | ≥ 33.36 and < 34.67 | 40-60th percentile |
D | ≥ 34.67 and < 37.17 | 20-40th percentile |
F | ≥ 37.17 | Below 20th percentile |
Refining the Grading System
In response to the observed Pareto distribution in our winning-runs dataset—where a few prolific players dominated the completed runs—we've implemented a data normalization strategy. To foster a more representative dataset without binning user-types, we've capped the contribution of any single player to five winning sessions, randomly sampled if they have more. This adjustment curtails the overrepresentation of frequent players and ensures our grading system more accurately reflects the broader gaming community, hopefully providing a more forgiving grade scale while still motivating new and better times.
Grade | Time (seconds) | Distribution Cut |
---|---|---|
S++ | < 30.00 | Exceptional (Beyond current best) |
S+ | < 31.00 | Elite (Top performers) |
S | < 32.00 | Excellent |
A+ | < 33.45 | 90-100th percentile |
A | ≥ 33.45 and < 34.55 | 80-90th percentile |
B | ≥ 34.55 and < 35.72 | 60-80th percentile |
C | ≥ 35.72 and < 37.22 | 40-60th percentile |
D | ≥ 37.22 and < 42.52 | 20-40th percentile |
F | ≥ 42.52 | Below 20th percentile |
The impact of this change can be observed through an example: a raw time of 30 seconds with 5 cheats previously adjusted to 41.87 seconds. Under the new system, the adjusted time is 42.69 seconds, which corresponds to a different grade, reflecting our more representative and balanced grading criteria.