Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by martyb on Thursday November 26 2015, @10:31PM   Printer-friendly
from the chuck-norris-always-wins dept.

Summary: I describe how the TrueSkill algorithm works using concepts you're already familiar with. TrueSkill is used on Xbox Live to rank and match players and it serves as a great way to understand how statistical machine learning is actually applied today. I've also created an open source project where I implemented TrueSkill three different times in increasing complexity and capability. In addition, I've created a detailed supplemental math paper that works out equations that I gloss over here. Feel free to jump to sections that look interesting and ignore ones that seem boring. Don't worry if this post seems a bit long, there are lots of pictures.

[...] Skill is tricky to measure. Being good at something takes deliberate practice and sometimes a bit of luck. How do you measure that in a person? You could just ask someone if they're skilled, but this would only give a rough approximation since people tend to be overconfident in their ability. Perhaps a better question is "what would the units of skill be?" For something like the 100 meter dash, you could just average the number of seconds of several recent sprints. However, for a game like chess, it's harder because all that's really important is if you win, lose, or draw.

It might make sense to just tally the total number of wins and losses, but this wouldn't be fair to people that played a lot (or a little). Slightly better is to record the percent of games that you win. However, this wouldn't be fair to people that beat up on far worse players or players who got decimated but maybe learned a thing or two. The goal of most games is to win, but if you win too much, then you're probably not challenging yourself. Ideally, if all players won about half of their games, we'd say things are balanced. In this ideal scenario, everyone would have a near 50% win ratio, making it impossible to compare using that metric.

Finding universal units of skill is too hard, so we'll just give up and not use any units. The only thing we really care about is roughly who's better than whom and by how much. One way of doing this is coming up with a scale where each person has a unit-less number expressing their rating that you could use for comparison. If a player has a skill rating much higher than someone else, we'd expect them to win if they played each other.

Older article from 2010, but still interesting.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Snow on Thursday November 26 2015, @10:47PM

    by Snow (1601) on Thursday November 26 2015, @10:47PM (#268449) Journal

    Take an FPS for example.

    Skill could be measured by timing the time it takes between the player spotting an enemy, and the player having the reticule on the enemy. This could be further refined by adding in how close to the head the reticule is.
    Number of shots that are fired in the vacinity of the target/number of hits
    Avg shots/kill.

    For an RTS, you could use an aggregate of actions per minute, k/d ratio (units, not games), hp healed (a possible indicator of micromanagement), fights started (I find better players are more likely to start a fight).

    Racing games are easy (Time - errors)

    In DOTA2, they have a web-graph that shows your play style. It has (from memory) Team Fighting, Support, Farming, Flexibility, and something else. That could be used to determine skill to some degree.

    There are lots of ways to measure skill, but when you have a large group of players, it's just easier to throw everyone in the same pool, and rank them up when they win, and rank them down when they lose. Eventually they will probably land in their slot.

    • (Score: 4, Insightful) by mth on Friday November 27 2015, @12:24AM

      by mth (2848) on Friday November 27 2015, @12:24AM (#268461) Homepage

      I'll focus on RTS, since that's a genre that I actually play (StarCraft 2):

      For an RTS, you could use an aggregate of actions per minute, k/d ratio (units, not games), hp healed (a possible indicator of micromanagement), fights started (I find better players are more likely to start a fight).

      There are a lot of ways those assumptions can go wrong.

      While better players have higher actions per minute on average, it's certainly not the case that increasing APM means you'll play better. Being able to prioritize actions is more important than being able to do more actions.

      For k/d ratio, it would have to be in terms of unit cost (resources lost) rather than unit count, but even then a skillful player might realize they've got a superior economy and throw units into unfavorable engagements to keep the opponent from increasing their income. In StarCraft, a Zerg player typically is less efficient in spending resources, but is able to gather more to compensate.

      Micromanagement can make a difference if comparable army sizes collide, but if the opponent shows up with an army twice the size of yours, it's unlikely good micro is going to save you. This might be different in games where the income isn't as variable as in StarCraft.

      The real skill in starting fights is to know when a fight will favor you. Pro players will exchange shots frequently, but most of the time back out of a fight when they realize they're not in a position to decisively win it. Unskilled players might shy away from fights, but they can also commit to fights that don't favor them. Also, some players only do rush attacks since they know they're not good at mid/late game macro, which one could consider a skill and a lack of skill at the same time.

      Overall, I think the things that are easiest to measure are not the most important indicators of skill. It might be better to just stick to win/loss: judge a play style on the results it gets, not on any of its elements.

      That said, there is one problem with using win/loss that could be mitigated by looking at other stats: win/loss only becomes reliable after playing a number of matches. Using stats like APM, income generated, resources lost etc. might be useful to get a quicker estimate of a player's skill to avoid matching them against players of a very different skill level before win/loss becomes reliable. While this concerns only a limited period in a player's 'career', first impressions are important: I imagine more than a few players give up on player-versus-player matches altogether after getting crushed by far more experienced players during their first few matches.

      • (Score: 0) by Anonymous Coward on Friday November 27 2015, @02:38PM

        by Anonymous Coward on Friday November 27 2015, @02:38PM (#268666)

        I agree with what you are saying. It did get me thinking of statistics and measuring in general, be it "who is a good programmer," "which is a good company to invest money in," and "who is a good baseball player."

        Everything you say about how the statics reflect truth, but can easily be gamed or misrepresented based on circumstances (e.g. a good player has high APM, but high APM doesn't necessarily make a player good) equally applies to the other fields I listed above. A company could have a good cash flow because business is good... or because they just took on a massive loan. A programmer could write a lot of SLOC because they are really talented and working on a hard feature... or because they are shoveling out unoptimized garbage.

        They've mostly figured this out for evaluating business worth, in that there are many well-paid accountants who can judge how good a company is. They've mostly figured this out for baseball, as demonstrated by Moneyball. I imagine they could do the same for any given video game, only there are so many of them and they are so short lived that nobody is willing to dedicate the time and resources to do so.

        It does make me wonder how come they haven't figured out any good mechanisms for calculating how good a software program or a software programmer is, though.

  • (Score: 2) by darkfeline on Friday November 27 2015, @12:19AM

    by darkfeline (1030) on Friday November 27 2015, @12:19AM (#268460) Homepage

    I think this kind of thing is common knowledge now (TFS mentions it's from 2010 too). I wouldn't really call it measuring skill, though; it's more like using historical data and statistics to predict the future.

    What's the difference? Consider a player who is skilled at using weapon A and another player who is skilled at using weapon B who are both ranked about the same, however weapon A in general trumps weapon B (and B trumps C, C trumps A, so the game is balanced). Player A would beat player B every time, but that nuance not something that can be captured by a statistical prediction rank and would be captured by a true skill ranking system.

    --
    Join the SDF Public Access UNIX System today!
    • (Score: 2) by mhajicek on Friday November 27 2015, @06:53AM

      by mhajicek (51) on Friday November 27 2015, @06:53AM (#268542)

      I had this in SCA swordfighting a number of years ago. A friend and I both specialized in sword and shield, but he could beat me 2/3. He had trouble defending against another friend who used two-sword, who could beat him 2/3. I didn't have that problem and could beat the two-sword friend 2/3.

      --
      The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
  • (Score: 2) by bart9h on Friday November 27 2015, @02:36AM

    by bart9h (767) on Friday November 27 2015, @02:36AM (#268489)

    Judging from TFS alone, it seems similar to the algorithm used by FIBS (First Internet Backgammon Server): everybody starts at a fixed value (1500). The number of points each match adds/subtracts from the players is (roughly) proportional to the difference between the scores. That way, if I win against a better player than me, I gain a lot of points, and he looses a lot. But if I win against a weaker player, I win a small amount of points, and he looses little too.

  • (Score: 4, Insightful) by TheLink on Friday November 27 2015, @03:58AM

    by TheLink (332) on Friday November 27 2015, @03:58AM (#268511) Journal
    Uh don't they do similar things already with rankings? From the summary all I see is someone talking about how to reinvent the wheel. The popular conventional ranking systems already take into consideration all that's mentioned.

    The difference with Microsoft TrueSkill and conventional rankings is they add the uncertainty/variance number - some people might be very inconsistent - can beat the best "on their day". Whereas some people can be consistently bad/good.

    So what you could do is calculate someone's global ranking and how inconsistent that person is in ranking.

    Now if you really want to go beyond that, what you could do is determine groupings. Just like rock-paper-scissors. It could be that a rank #100 "rock"-style player has a 50% chance of beating a rank #50 "scissors"-style player. And there could be more than 3 main groups (or not). You may need to do a lot more math and computing to determine that. I'll leave that to the geniuses.
    • (Score: 0) by Anonymous Coward on Friday November 27 2015, @07:43PM

      by Anonymous Coward on Friday November 27 2015, @07:43PM (#268767)

      They're just looking for ways to oppress. Everyone has a high school diploma and many have college degrees. These people don't understand why they were told to pursue STEM for their careers and yet there are no jobs despite the shortage.

      Braindumps are the way in, not college. The piece of paper still matters, but if people are too dumb to cheat then they will remain unemployed. In much of the same vein that a smile makes a lousy umbrella, principled approaches to workplace ethics won't get you a job or allow you to keep one.

      You can still screw up and get fired -- that's a fault of the cheater, for being too dumb to know the basics. But there are plenty of examples of supremely competent people being overlooked because someone else also has an irrelevant A+ cert for dos/win 3.1 so hire that guy instead.

      His score is higher on the metric tabulation because he has something irrelevant that still contributes to his overall calculated value.

  • (Score: 1) by xorsyst on Friday November 27 2015, @12:34PM

    by xorsyst (1372) on Friday November 27 2015, @12:34PM (#268626)

    For an example of this used generically on board games, I can recommend looking at Yucata. There's also loads of good discussion on the merits and issues with it in their forums.