One of the acknowledged benefits of the Tribunal reform card system introduced to League of Legends a few weeks ago was the ability to look up any Reform Card, at any time, just by making up a plausible case number. I soon realized that, with an automated system, I could download large numbers of games and perform some analysis on them, trying to learn about the Tribunal and the game as a whole.
NOTE 1: Gathering these data did not involve hacking, cracking, or breaking any form of Riot's web security. All of the information is available on publicly accessible web pages, and automated downloading of those pages was in compliance with leageoflegends.com's webcrawler guidelines as specified in robots.txt.
NOTE 2: These data are only meaningful so far as they represent an accurate sample of Tribunal cases, or of League of Legends games. I assume that every Tribunal case produces a publicly accesible Reform Card, and I have pretty good evidence that this is the case, but if I am wrong, my conclusions may be invalid.
I downloaded 9254 Reform Cards, randomly selected between case 5572301 and case 5637015. This represents about a week's worth of reform cards. My dataset is a sample which includes about 1/7th of the games, but since it is a random sample, it should be more than sufficient to draw conclusions about the entire block.
The dataset includes everything you can see when loading a reform card on the website, plus a number of other facts that your browser actually downloads every time but that do not display. For example, the k/d/a, gold, cs, and item builds of the enemy team are actually sent to your browser, and I have captured them.
The simplest question to ask is "How often does the Tribunal vote to punish?" And already, we find the first curious fact about Reform cards.
38.78% of cases are voted Pardon, and 61.22% Punish. However, 39.99% of cases show a punishment of None. In 1.21% of cases (112 cases in my dataset) a Punish vote is listed but the punishment assigned is None. These cases are intriguing, and we'll come back to them in a moment.
Next up is the question "How often are warnings, temp bans, and permabans issued?"
Limiting ourselves to the Punish cases, the data are as follows.
None (see above) - 1.35%
Warning - 47.34%
Time Ban - 47.53%
Permaban - 3.77%
Proportionally, very few cases result in a permanent ban. However, that number represents 198 cases captured, which extrapolates to around 1400 over the entire period.
Next, how often is the decision controversial?
Controversial - 38.42%
Majority - 42.41%
Overwhelming - 19.17%
These numbers alone aren't very meaningful, because we don't know how Riot divides results up into these categories. Controversial may only contain cases that are within 1% of the decision break point, or it may contain cases within 5%, or 10%.
However, it is probably safe to assume that those bins are the same from case to case, so we can ask "Does the degree of agreement vary by the verdict of the case?"
Here, for the first time, I won't just post a bunch of numbers. Now we are asking proper statistical questions, and the answer comes in two parts. First, is there a difference, and second, is that difference meaningful?
Yes. Pardon cases are 12% more likely to have an overwhelming majority, and punish cases are 12% more likely to be controversial. There is no significant difference between the agreement rates for the various punishments.
One possible interpretation of this (and this is just speculation) is that about 2-3% of Tribunal cases are blatant errors, and Tribunal voters notice them and reject them. Another interpretation is that the break point for punish is well above 50%, so the "Overwhelming Majority" bin for Pardon simply covers more possible outcomes.
Remember those punish cases without punishments? What about them? It turns out, the agreement profile on those cases matches the Punish cases far closer than the Pardon cases, which means the verdict matches the votes. It's still not clear why there is no punishment, however.
Game and Report Totals
Next up, how does total number of reports and total number of games figure into things?
First, how many reports does the typical game have? The median case has 5, the average case has 5.48. The mode (most common value) is 4. Two cases in the sample have no reports (both are pardons); I can only assume that is due to a Tribunal bug. There are very few cases above 15 or so reports, but one enterprising person managed to collect 33.
As reports go from 1 to 7, the chance of punishment increases quickly from around 20% to around 80%, then climbs very slowly thereafter.
We can do a similar analysis to the number of games in a case. The mean is 2.73, the median is 3, and the mode is 2. Pardon chances range from 62% in single game cases down to 24% in 5 game cases.
Lastly, we can consider reports per game. Again, as reports per game goes up, so does the chance of punishment.
Also, remember the mystery cases, that are a punish but no punishment? In these stats, they fall about halfway between the straight punishes and the pardons, which doesn't help explain them.
Unfortunately, there isn't a lot to say about the rest of the cases, because we don't know how these games are being chosen. The 1 report cases seem almost worthless to include in the Tribunal at all, though.
I will continue to do analysis and update this thread, probably once every few days.
Next, in Part 2, I will look at statistics derived from individual game records. Champion selection, k/d/a, win/loss, game length...
In Part 3, I plan to get even more ambitious and find out if there really are instapunish words in chat, and what they are. Also, I will possibly look at item builds.
In Part 4, if I can keep this up that long, I will look at things I haven't thought of yet, or take user requests, or something.
A note about Part 1: It turns out the mystery data are duplicate reports and due to a Tribunal bug. However, I chose not to remove them from the set, because I have no way of finding, or even confirming the existence of any comparable duplicate Pardon cases.
Part 2: Per-Game Data
In Part 1, I was looking exclusively at data gathered from the main page describing the case. Now, I will start looking at the individual games.
This analysis will be a little different from Part 1, because by looking at individual games, I can try to answer questions not just about the Tribunal, but about the game as a whole.
We'll start with a simple question: Who is the most popular champion? Remember, this covers all players in games which appear in a Tribunal case for one week, about 2.5 weeks ago.
The top 5 champions are:
Ashe - 3.59%
Teemo - 3.08%
Master Yi - 2.70%
Ezreal - 2.43%
Darius - 2.19%
The bottom 5 champions, on the other hand, are:
Gragas - 0.31%
Xerath - 0.29%
Victor - 0.22%
Trundle - 0.17%
Karma - 0.17%
(Zyra) - 0.13%
Zyra isn't really that underplayed, she was not available for the entire period of data gathering so her data aren't useful to compare.
Now, lets compare that with reported players:
Master Yi - 3.91%
Teemo - 3.30%
Ashe - 3.24%
Ezreal - 2.63%
Darius - 2.59%
Yorick - 0.28%
Xerath - 0.28%
Viktor - 0.23%
Karma - 0.17%
(Zyra) - 0.14%
Trundle - 0.11%
And punished players:
Master Yi - 3.85%
Teemo - 3.26%
Ashe - 3.15%
Darius - 2.59%
(bottom 5 are roughly the same as above)
So now comes the big question...what champion is punished proportionately the most more than they are played, or reported?
Zyra is excluded from all of the following tables, since she was new enough at the time of data gathering that the Tribunal would not have had time to build many cases for her. Remember, however, that every game has a reported champion in this set, so these numbers will all be much higher than the LoL population in general.
First, we'll compare how often a champion is reported to how often it is played.
Tryndamere - 15.8% report rate
Twitch - 15.6%
Master Yi - 15.1%
Evelynn - 14.9%
Mordekaiser - 13.9%
Ahri - 6.8%
Sona - 6.0%
Taric - 5.9%
Leona - 5.6%
Janna - 5.4%
So when seen, Tryndamereis the the most likely to be reported, and Janna the least.
That covers reports, but once reported, who is most likely to be punished?
Irelia - 77.7% conviction rate
Udyr - 77.6%
Volibear - 75.9%
Trundle - 73.3%
Xerath - 73.2%
Maoki - 57.6%
Malphite - 56.8%
Skarner - 55.7%
Jayce - 54.2%
Leona - 52.6%
Irelia? Really? I would never have guessed that one. (Note that Trundle and Xerath have few enough cases that their numbers are likely not particularly stable from week to week)
Lastly, we can combine the two and directly compare plays to punishment verdicts (remember, 6.8% of all players in the sample were punished):
Evelynn - 10.8% punishment rate
Twitch - 10.4%
Master Yi - 9.7%
Tryndamere - 9.6%
Twisted Fate - 9.4%
Taric - 4.0%
Ahri - 3.9%
Sona - 3.7%
Janna - 3.3%
Leona - 2.9%
That is to say, Evelynn is the champion most punished by the Tribunal per time she is played, and Leona the least punished.
I find it interesting that Arhi is very near the bottom of the first and third lists, which are otherwise filled with supports, but I donít know what to make of that.
Game Outcome, Kills, Deaths, and Assists
What champion gets the most kills per game?
Twitch - 8.5
Darius - 8.5
Fizz - 8.3
Akali - 8.3
Katarina - 8.0
Leona - 2.8
Sona - 2.6
Taric - 2.0
Soraka - 1.8
Janna - 1.6
This supports the many ďDARIUS=OP QQĒ threads over in General Discussion, but I didnít expect Twitch to top him by a few hundredths...
What champion dies the most?
Karthus - 7.3
Twitch - 7.2
Master Yi - 6.9
Xin Zhao - 6.9
(Zyra) - 6.9
Twisted Fate - 6.8
Malphite - 4.8
Taric - 4.7
Soraka - 4.6
Anivia - 4.6
Janna - 4.0
Apparently, egg really is that good at keeping Anivia alive. And maybe sacrifice Karthus is more popular than I thought.
And lastly, assists by champion.
Taric - 12.2
Soraka - 12.1
Sona - 12.0
Janna - 12.0
Alistar - 11.4
Darius - 6.1
LeBlanc - 6.1
Fiora - 6.0
Vayne - 5.6
Master Yi - 5.4
Supports at the top, burst-heavy carries at the bottom.
Winners vs. Losers
How different are the k/d/a results between winning teams and losing teams? (these numbers are per champion)
Winners get 7.2 kills/game, and losers 4.5 kills/game.
Winners die 4.5 times a game, and losers 7.3 times. (These are the mirror of the above, since every kill is a death, and nearly every death is a kill)
Winners get 10 assists/game, and losers get 6 assists/game.
Now, back to explicitly Tribunal related data, how different is the reported player from his team? Itís clear that these numbers need to be normalized by outcome. so Iíll consider losing reported players and winning reported players separately.
Remember, these stats only cover the team with the reported player on it, so they wonít quite match up to the numbers above.
In losing games, the reported playerís allies averaged 4.4 kills, and the reported player only averaged 3.3 kills. Allies averaged 7 deaths, the reported player 7.6 deaths. Allies averaged 5.8 assists, and the reported player averaged 5 assists.
In winning games, the numbers arenít so clear. Allies averaged 7.5 kills, while the reported player averaged 8.1 kills. Deaths were 5.1 for allies, 5.8 for the reported player. Assists were 10.7 for allies, 9.5 for the reported player.
Limiting the numbers to only punished players, and not just reported ones, doesnít result in too many changes. For the losing team, an offender averages 3.9 kills, 8.6 deaths, and 5 assists. On the winning team, the numbers are 8/6.1/9.5. Worse play seems to lead to a slightly higher conviction rate (easily accounted for by blatant intentional feeding cases), except that convicted losing players have a higher kill count than reported losers overall.
There is a lot more I could do in this section, normalizing reported player kills by champion, deriving a game impact single value that accounts for kills, deaths, and assists, or looking at gold or other numbers. But Iíve already gone pretty long for one section, and I donít want to burn out on the boring number crunching before Part 3, where I start looking at chat logs. So thatís it for this section. If thereís anything I didnít cover that you are burning to know, ask in the thread and Iíll see if I can get it into part 4.
Part 3: The Chat Log
The main tool the Tribunal voter is given to judge a case is the complete chat log of the game. This is the most subjective and difficult place to make a judgement. Who is trolling? How much of an insult is too much?
I have decided to try see if I can stop trends in Tribunal behavior by looking at certain categories of words. First, however, a big caveat. This analysis required me to pick a few representative words for each behavior. Typos, misspellings, words I didn’t think of, words used in a different context, and many other things may throw these conclusions completely off.
Second, I’m looking at one game at a time here, instead of one case at a time. That means that high game count cases will skew the data from the numbers in part one.
So, let’s begin.
The first word I want to talk about is the most commonly discussed instapunish word, n****r. And indeed, it is pretty much an instapunish word.
0 usages - 65% Punish
1 usage - 87% Punish
2 usages - 94% Punish
3 usages - 93% Punish
4+ usages - 100% Punish
As for the overall prevalence of the word, 0.52% of players overall in the database use it at least once, and 2.11% of reported players use it.
The Other N-word
This one I can type in without self-censoring. Noob. The more it is said, the more likely a punishment is, but not so dramatically as above.
0 usages - 62% Punish
1-8 usages - ~75% Punish
9+ usages - ~85% Punish
As for the overall prevalence of this word, 11.9% of players overall in the database use it at least once, and 31.3% of reported players use it.
Now it gets a little fuzzier, because there are too many swear words to look for comprehensively. I made two categores (Bad swears and not-so-bad swears), and the punishment curves for both of them look about like noob does.
Game Skill Insults
I put this one in because it bothers me a fair amount. These are words that specifically mean that the other person shouldn’t be playing the game. “l2p” and “uninstall” are what the algorithm searches for. The progression isn’t nearly as clean as for the other categories above, but conviction does go up as these words are used more.
These are non-swear word insults; things like “idiot”, “moron”, and so forth. For this case, the curse is almost exactly the same as the noob one again.
Lest you think I am obsessed with bad language, I am also looking for polite words, like “please”, “thanks”, and “gj”. And here is where things get interesting. Being polite correlates to a small but measurable increase in conviction chances. This could be from people who are being insulting but using polite words, of course, but it’s food for thought.
0 usages - 63% Punish
1+ usages - 67% Punish
Maybe it’s not what you say, but how much you say? Looking at the number of lines of chat in a game, and things get really interesting.
0 lines - 52% Punish (this represents only 2.4% of games, though. Most people talk.)
1-10 lines - 50% Punish
11-50 lines - 63% Punish
50+ lines - 72% Punish
So, there is at least a little to be said for the idea that if you don’t want to be reported, don’t talk. Of course, we’re only looking at reported people here, so it’s approaching the situation backwards. Plenty of people talk all the time and are very rarely reported; they just aren’t in the Tribunal.
Other Chat Facts
Another thing I’ve tried is to identify the language being spoken by a player. This is a lot harder than it sounds. Ultimately, rather than try anything clever or complicated, I decided to look for a few simple words in both English and in Portuguese/Spanish (difficult to distinguish without proper diacritics), and accept that I would be missing a lot.
My results are as follows.
44.8% of the talking population of LoL is recognizably speaking English.
0.9% of the talking population of LoL is recognizably speaking Portuguese or Spanish
Obviously, this isn’t catching everyone in either case (either because they are just saying mia, re, etc, or because they are speaking recognizable English/Portuguese and I’m not detecting it)
Assuming that in actuality, those should total 100%, the adjusted numbers are 98.1% English and 1.9% Portuguese.
Three months ago, I would have said that number was obviously wrong, but I can’t remember being matchmade with Portuguese speakers recently. Maybe the beta Brazilian server is attracting them away?
Can looking at chat habits predict who wins a match? If a player sends more than 50 lines of chat in a match, their winning percentage is only %40 (the average is, for obvious reasons, 50%).
What about game related chat, though? I built an “on topic words” filter like the ones above, but for “mia”, “top”, “bot”, “baron”, etc. Usage of on-topic language does not influence winning percentage in any way meaningful way, however.
I know I said I would try to add item analysis here, but the coding necessary looks complicated and while I find this project entertaining, running these numbers over and over is getting a bit boring. So I’m going to call this section here, and spot check user request or other random tidbits I’ve thought of for part 4 later on.
Part 4: Random stuff
In this section, I'll answer analysis questions, present a few miscellaneous findings, and muse about what I really figured out.
IIRC Lyte said that the ones that say both "Verdict: Punish" and "Punishment: None" were bugs that were in fact punished in some way. Let me see if I can dig up his post quickly.
EDIT: Here we go. My bad, they were actually duplicate bugs, rather than discrete cases. You should scrub them from your data, most likely - they're weighting the analysis incorrectly.
I'll be honest....I require part 2 ASAP.
And can you provide and PROOF as to the sources of your data? Not that I don't believe you per se, but you're gonna start getting a lot of naysayers in this thread real quick.
People on the forums refuse to acknowledge any problems with the Tribunal; even if Lyte himself says "were working on improving feature x" etc etc.
5606497 5606509 5606556 5606582 5606599 5606600 5606613 5606641 5606663
5606755 5606804 5606817 5606885 5606896 5606986 5607042 5607068 5607111
5607171 5607199 5607280 5607319 5607340 5607356 5607365 5607382 5607815
5607840 5607860 5607869 5607883 5607892 5607915 5608015 5608061 5608143
5608201 5608271 5608445 5608575 5608604 5608611 5608650 5608654 5608676
5608680 5608721 5608723 5608750 5608758 5608759 5608775 5608803 5608809
5608812 5608840 5608857 5608860 5608884 5608893 5608909 5608920 5608965
5608967 5608987 5608988 5608999 5609021 5609059 5609075 5609081 5609082
5609083 5609088 5609135 5609145 5609175 5609194 5609206 5609207 5609227
5609239 5609332 5609334 5609348 5609363 5609364 5610286 5610292 5610304
5610489 5610494 5610499 5610505 5610525 5610532 5610540 5610544 5610559
5610595 5610605 5610623 5610629 5610642 5610695 5610700 5610707 5610714
5610937 5611024 5611034 5612161
I'd speculate that cases at the very edge of the punish/pardon limit are flagged for audit.
edit- decided to poke into it, and these cases there's some... pretty obvious intentional feeding going on. Another theory could be that the cases are market no punishment due to concurrent punishment....
edit the second- those cases have a high incidence of duplication. Likely database error and, as such, duplicates don't get a punishment.
© 2013 Riot Games, Inc. All rights reserved. Riot Games, League of Legends and PvP.net are trademarks, services marks, or registered trademarks of Riot Games, Inc.