Quote:

Originally Posted by

**ThirdTimeLucky**
1. Lyte's comment is only half correct. The convergence math assume a static population. If a significant percentage of the population is churning, this result cannot be assumed.

Yes and no. In that at any particular point things may not be perfect, but that they are always moving towards it. This isn't a problematic issue and exists in all rating systems which continually add people because adding people adds uncertainty, at the very least, to those people who recently enter. Less so to others if the population is large, which it is.

We can see how it adds less uncertainty by pretending that your ranking wasn't a number, but rather a partial order with each person lined up in order of ranking. If there are three people in line and one person enters and then find their proper spot, the total order has moved significantly relative to the middle point. The middle person, who was previously ranked in the middle, is no ranked either the second or third quartile. If three hundred people are in line the total order has moved very little relative to the middle point.

Since ratings are more or less enforced to average to 1200 adding new people doesn't do much to increase uncertainty in your ELO.

That is to say that new entrants are only problematic in that they consistently make it more difficult to predict the games which have the new entrants.

Quote:

2. Actually, ELO does NOT converge well in the mutli-player scenario

ELO, like true skill, is a heuristic to achieve Bayesian updating. ELO is just a common name for it. There are limits to how fast we can become more certain of a value with Bayesian updating which is a function of the amount of information contained in the newly added data points. Less information relative to the amount of information currently possessed means slower updating. Adding players to a game reduces the amount of information that we have available on the skill level of players. Such there is no system which "converges well" for multiplayer games where the players are not static. This is not a function of the ranking system you use, it is a function of the information contained in the data points.

Many people think that because the original ELO system did not have a system to measure uncertainty that it is not a heuristic bayesian updater, which is wrong, it simply requires less computations and few more assumptions. It also is not likely as swift in updating as others.

To understand, roughly, why this is so consider Baysian updating.

P(A|B)P(B)=P(B|A)P(A). Or written in the more common form P(A|B)=P(A)P(B|A)/P(B)

Or as we are discussing it. The posterior distribution is proportional to the prior distribution multiplied by the likelihood. I say proportional because we don't actually know P(B) in many cases.

There is one interesting thing that TrueSkill does, but its not actually pertinent to how it matches [well, it may be, but probably not]. In that it lists your skill not as the mean, but as the mean minus three standard deviations. I.E. Your "True Skill" rating is the systems "I am 99% confident you're better than this" point.

From there all we're doing is tuning our likelihood function until we converge as swift as possible. Which for a game like league with millions of players and low information content, will be quite a while.

It is also important to note that "TrueSkill" as described in that link, is not showing convergence for games like League. Its showing convergence for games like Halo, where the final score determines the winner and provides valuable information about who is better than whom. In league we can't detect easily when one side wins handily, both because the nexus is the only thing that matters, and because its much to easy to game stats in increase your "skill" without actually being better at the game. Whereas in Halo if you go 50/1 it means you dominated, in League if you go 50/1 you can still lose, and in going 50/1 probably were not playing as efficiently as you could[for various reasons not worth discussing here]

Note also that while the name that League of Legends uses for its heuristic is ELO, it is not free from uncertainty. And of course that there must be a minimum level of uncertainty in ratings, because without it it becomes impossible to move your rating.

Quote:

3. The most important argument against ELO as implemented here is a bit more subtle -

It creates a terrible gaming experience. To quote again:

[...]

Simply put, newer players are not average.

Actually in League "new" players to ranked are pretty average. Its one of the advantages of having to play 200+ games and own 16 champions before starting ranked. There are no "newbies"

Quote:

This "downdraft" makes it more difficult to move up. (because if you end up below the starting level, you are more likely than average to be paired with players who should be going down". Finally, the new player finds themselves stuck with players who have richly "earned" their low ELO over time due to trolling, frequent afking, or other issues.

If newer players experience an unfair downdraft then they're more skilled than their teammates and the probability of them winning matches increases. ELO hell does not exist. More trolls and AFK's will exist on the other team, creating a net advantage for the player who is legitimately more skilled than his ranking suggests

Quote:

- here are plenty of documented cases of huge streaks. If the system was actually working, the probability of say, a 14 losing streak occurring is just over 0.006% - that is 6 chances in 100,000. Show us that such streaks are as rare as ELO theory says it should be.

No. The probability of seeing a 14 game losing streak is not .0006%. The probability of seeing a 14 game losing stream in

*any particular 14 games assuming that the probability of winning the game is stable at 50%* is about .0006%. But... and this is a big but. The probability of seeing at least one 14 game losing streak in a large number of ranked games, making the same stable win probability assumption is actually quite high. I don't have numbers since IIRC doing this requires simulation and I am lazy. But I can tell you that the probability of seeing 9 heads or 9 tails in a row if you flip a coin 1000 times is about 80%, 10 in a row is about 60%.

Since large streaks can "feed" on themselves due to poor play from psychological factors its not really a surprise for this to be seen in "large" numbers.

It would be interesting to know if large streaks were common or uncommon [i suspect large streaks of losses are more common than large streaks of wins] but I am not sure that knowing that we see large streaks of losses more often than large streaks of wins tells us anything particularly interesting about our match making system.