Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - markus

#331
http://dominion.lauxnet.com/scavenger/?user=Dwedit helps to find the game number, when it was rated.

The bug occurs at:
Game Id: 3608195, decision index: 97
#332
General Discussion / Re: Leaderboard
10 May 2017, 11:54:12 PM
Thanks, I just want to correct two things  8)
1) The daily decrease in phi in Glicko doesn't actually depend on the results of the games that you play.
2) Sigma affects everyone and not just those that havent't played. This is the reason why phi doesn't go close to 0, even after many games. If you play more games per day (!) it will converge to a lower level. What that level is, depends on sigma and the quality of your opponents.If you then cut back on the number of games per day, phi will increase and converge to a higher level.
#333
General Discussion / Re: Leaderboard
10 May 2017, 09:19:45 PM
mu is the estimate for the a player's skill: the difference between two player's mu primarily determines the win probability:

mu-difference: 0.5   1   1.5   2
prob. of win:  62%  73%  82%  88%


phi measures how certain the system is about the skill mu (95% of players are supposed to have their true skill between mu-2*phi and mu+2*phi) It decreases when games are played. (more so, if phi is high (uncertain own ranking), or if the opponent has a low phi (certain about opponent's ranking), or if the opponent is more equal (more informative game outcome)
In turn, phi affects how much mu changes, when the actual wins differ from the expected wins. Higher phi lets mu change by more. (change in mu is roughly phi^2*(wins-exp.wins))

sigma determines how much phi increases per day (after having potentially been decreased by playing games). This captures that we become more uncertain about a player's skill as time passes by. In particular phi_new = sqrt(phi^2+sigma^2) if a player doesn't play.
#334
General Discussion / Re: Leaderboard
10 May 2017, 06:16:47 PM
I tried the above suggestions with the rated 2 player game results from the first 38 days and created the attached figures.

It was not surprising that most of the gain comes from choosing a more suitable initial phi instead of 2, which turned out to be too large. Sigma could also be lowered a bit, but that affects the results less and (hence) is harder to estimate. (Intuitively it measures how much a player's skill could change over time, which is hard to know after a month).

What I also liked is the iteration of the daily results: that means that you use the opponents' end of day rating to calculate your rating change and helps when matched up with newer players, whose rating still changes more within a day.

Other improvements (boost for exceptional performances, letting sigma depend on mu and phi) didn't matter much. So I left them out here for the sake of keeping things simple.

The estimates are around initial phi=0.75 and sigma=0.05. Therefore, I'm plotting 3 versions:
1)   current system (phi0=2, sigma=0.06) in red,
2)   phi0=0.75, sigma=0.05 in blue
3)   phi0=0.75, sigma=0.05 and adding the iteration in green

To get the estimates, I minimized the average "discrepancy" of rated games (negative log-likelihood for the experts), that is –s*log(p)-(1-s)*log(1-p), where s is the outcome of the game (0 / 0.5 / 1) and p is the predicted win probability. The most boring rating system would always predict a 50% chance of winning, which results in a discrepancy of 0.693. So that's where the curves on day 1 start in the top left panel. I discarded the first 2 weeks, when estimating the parameters, because everyone starting from scratch is not going to be representative in the future.

You can see in the top left panel that naturally all systems get better over time. Interpreting the absolute values of discrepancy is difficult, because it depends on (quality of) match-making: if everyone was matched against an equal opponent, you can't do better than predicting 50% and will be wrong 50% of the times (discrepancy=0.693). If a strong player plays someone weaker and you correctly predict 80% win chances the discrepancy is 0.500.
So what happens, if you for example predict a 65% chance of winning, while the true one is 70%: the (average) discrepancy will be 0.616 instead of 0.611 for the best prediction. I'm saying this, because even though the differences between the curves might not seem too big, they can actually mean a lot in terms of predictive power.

For the second panel in top row, I use the number of days a given player has played on the x-axis. (again discarding games in the first 2 weeks). So the first point are new players that only started playing rated games after 2 weeks. You can see that the optimized coefficients do better in the beginning. After all, most of the gains come from picking a better initial value for phi. Hopefully, there will be many new players also in the futures, so it would be good to do a better job there.

The third panel in top row conditions on mu at the end of the 38 days (so how good the third model believes that players currently are). You can see that most of the gains come from players in the middle of the distribution. That's not surprising, because they account for the most games played, and I didn't give any higher weight to top players for example, when minimizing discrepancy.

The first panel in the middle row shows the bias of the predicted win probability. The red curve (current system) shows for example that players with an expected win probability between 70% and 80% actually won 5% fewer games than predicted. Ideally, you want to have these curves around 0, which the improved versions mostly achieve. (My intuition for the bias is that with high initial phi players with good/bad results in the first days over/under-shoot their actual skill mu, such that they get too high/low expected win probabilities in the following days.)

The middle panel shows the cumulative distribution function for mu at the end of the sample (what fraction of players is below a given value for mu). The optimized ones are much less dispersed than the current one. This is mostly due to players with only few games, who'll stick closer to mu=0, when initial phi is lower. This also means that a player who is truly very good/bad will take a bit longer to reach the appropriate mu, as each game's result has less of an influence on mu.

The right panel in the middle row shows the resulting cumulative distribution function for phi. In general, phi will be lower because of a lower initial value and a smaller daily increase (showing up mostly for players with many games in the bottom left). Phi is also capped to be at most 0.75.

The bottom row looks at the "rating deflation" that seems to be going on. Although I should say first, that this is not a big problem, because only absolute differences in mu between players matter, such that a shift of the whole distribution doesn't change predicted win probabilities for example. In the bottom left panel you can see that the average mu of players that have already played until that day falls over time. This can happen in theory, because the gain in mu of the winner is generally not equal to the loss in mu of the loser of a game. In particular, players with a higher phi (newer or less frequent players) get a bigger positive or negative adjustment of mu. Therefore, the decline of average mu suggests that players with a high phi underperform the expectations. That could happen, because better players played from day 1 / earlier and newer players are worse than the average older player. The good thing is that this shouldn't go on forever, because adding new players at mu=0 (higher than the average) counteracts this effect. And with a bit of fantasy one can see the red curve at least beginning to flatten out over time.

The middle and right panel in the bottom row show how the cut-offs for top 1% and top 10% / top 100 and top 1000 players evolved over time.
#335
General Discussion / Re: Leaderboard
08 May 2017, 01:51:13 PM
Quote from: Stef on 08 May 2017, 12:42:12 PM
The lower initial phi of 1 instead of 2 seems good. It now seems to needlessly penalize new players. But can you back it up with some argument/example? Suppose a new account does pretty well on its first day, an 8-2 record against various random opponents, what would the resulting rating be for these two options?

I wouldn't think of high phi as "penalizing" - that is "only" true for the rank in the leaderboard. High phi primarily means that mu changes more in response to under/overperforming.

For the 8-2 example that means (I take opponenents to have mu=0 as well):
phi=2: new mu=1.47, if opponents are new (phi=2) / mu=1.11, if opponents have phi=0.4
phi=1: new mu=0.90, if opponents are new (phi=1) / mu=0.87, if opponents have phi=0.4

Quote from: Stef on 08 May 2017, 12:42:12 PM
I'm uncertain about using (mu - phi) over (mu - 2 * phi). While there is no need for quick degeneration of levels, I do actually believe you get worse at Dominion if you don't play for a while, and I don't want people that haven't played for half a year to still be near their old rating/rank.
First, my hunch is that currently (average) phi is too high, because sigma is too high. (Especially at the top of the leaderboard). That makes subtracting 2*phi more important. I don't have a strong opinion on that, but I would continue to subtract at least 1.5*phi.

There is no degeneration of mu in Glicko - you could add that, but we don't have any observation of players not playing in a while now, so it would be a bit arbitrary. But you could for example subtract 0.1 for players who haven't played at all, 0.09 for players with 1 game,..., and 0 for 10 games or more. This could lead to an overall deflation of rankings, so you would have to boost everyone's by a tiny bit. (I actually noticed, that the average mu in the current system keeps falling, currently at -0.38.)

To get a degeneration of the rank in the leaderboard, it's already fine if phi increases. Sigma determines how strong that is. The underlying assumption currently would be that after half a year not playing, a third of people should have improved/lost their skill mu by at least 0.8. That seems too much to me for players at the top, but maybe for the average player that's accurate.

Quote from: Stef on 08 May 2017, 12:42:12 PM
I am tempted to change the system to immediate updates over daily updates. Not that I really prefer that myself, but people seem to like it and it prevents some questions about the current leaderboard.

Probably the way to go is like Scavenger to display an updated rating, that is at least approximately the new one at midnight - and use this for matchmaking purposes. (With the current algorithm it's possible to predict it exactly, with some refinements like the iteration suggested in 3) above, it wouldn't be exact anymore).


Quote from: Stef on 08 May 2017, 12:42:12 PM
Most of all I don't want to introduce new rules/parameters too often. Ideally you or someone could compose an actual short list of proposed changes and when there are no valid counterarguments we just do that. That would require filling in some more details.

I agree that this shouldn't change every month. Now, it would make sense to do something, because we have actual data to base it on / try it out. If you can provide for example a CSV file with the rated games (day,Player1,Player2,result), I'm happy to play around and make some suggestions.
#336
General Discussion / Re: Leaderboard
08 May 2017, 12:15:28 PM
I looked a bit more into potential improvements of the algorithm. I think there should be enough data for rated 2p-games (more than 300,000) to estimate a better initial phi and sigma. If there's an easy way to provide a list of these game outcomes, I'm also happy to play around with this or the suggestions from Glicko-boost below.

I think that having variable sigma doesn't do much. (That was the innovation from Glicko to Glicko-2.) In theory, it's nice, if more consistent players have a lower phi (more certainty about rating), but in practice that's hard to pick up by the algorithm. funkdoc has the minimum sigma=0.0595 in the top 20 now. Relative to sigma=0.06 that means that the phi after one day of not playing increases from 0.1597 to 0.1704 instead of 0.1706.

Also in the example of Mark Glickman's paper (http://glicko.net/research/dpcmsv.pdf) in Figures 3 and 4 it's apparent that sigma doesn't change much. And most of the difference between the top two panels comparing Glicko and Glicko-2 arise because sigma=0.01 in the constant variance case, and it's initialized with sigma=0.05 in the stochastic variance case.
Bottom line is, it doesn't hurt for now, but I think there are better ways in which making the algorithm more complex improves ratings.


I like the extensions that Glickman used in his more recent Glicko-boost algorithm (http://glicko.net/glicko/glicko-boost.pdf):

1)   First mover advantage: in the long run, it doesn't really matter for the rating, as players roughly play 50% of their games as first player, but in the short run it causes unnecessary rating fluctuations. This is overcome by adding something to the mu of the first player when evaluating the outcome. In the simplest case, it would be a constant, but one could estimate also more complicated forms (e.g. depending on difference of skill, or absolute level of skill).

2)   Phi boost based on exceptional performance: the idea is that on average players could have a lower phi, making their ratings less swingy once they have stabilized. But if someone plays exceptionally strong, their phi is increased to make climbing the leaderboard easier and to reflect more uncertainty about their rating. This serves a similar purpose as variable sigma in the current system, but it apparently works better – at least Glickman used it for the more recent algorithm.

3)   Iterating on ratings update: that is useful mainly for games involving newer players. The idea is to not use the opponent's rating from the beginning of the day, but from the end of the day. So if the opponent I beat today also lost a lot of other games, I'll get less of a boost than if they won the other games. That should prevent some of the extreme mu's that we have seen after the first few days, when a player with a high phi beats a stronger player / loses to a weaker player a couple of times.

4)   Sigma depending on mu and phi: I'm not so sure about that one, but one could estimate the daily phi increase based on mu and phi. The example would be that someone with a high mu is more likely to have a stable skill, than someone who's still learning the game. So their phi would increase by less per day and thus be lower. This would stabilize the mu of better players. Of course, it might be that estimating the parameters leads to the opposite result – or doesn't make much of a difference, in which case it doesn't have to be implemented.
#337
Rematch should definitely be allowed. I'm happy to play again, once I've found a nice opponent.

If I leave the table and search again, I'd obviously rather have someone new even if I don't blacklist the old opponent. Having more options to define an "equal opponent" would already help.
#338
Other Bugs / Re: Game destroyed by Undo
02 May 2017, 10:28:48 PM
I have a league game against drsteelhammer that might help locate the bug.

Scheme was included in game 3276955. On my last turn, I wanted to undo a Masquerade play. dsh granted, but it kept pending on my side. dsh then resigned.

After reloading at the last decision point the log (and deck contents) were different than before. (e.g. I was supposed to have trashed an Encampment, and dsh had bought a Moat, which both never happened in the original game). That was game 3277403. (I think the log was already off after dsh resigned the first game)

We then reloaded at the earlier decision point 150 (now game 3277461), and everything was fine (other than the last turn obviously missing). We didn't try any more undos.  8)

Another (probable) bug is that both of the loaded games show up in "Scavenger", even though they were selected to be unrated (and loaded games shouldn't be rated). And I suspect that they will also be rated in the official leaderboard.
#339
Fool's Gold is something where auto-buy could be improved on, because it currently always counts it as 1 coin.
#340
Regarding 1) debt cards, there must be the option to buy a debt card with auto-buy on without playing all (basic) treasures (edge cases are e.g. Haunted Woods or Possession)

To be honest, I'm fine with auto-buy being something only for single buys not involving debt. Misclicks for single vs. multiple buys wouldn't be a problem, if "no-info" undo was available (barring edge cases like Inn). Auto-buying debt cards without playing enough treasures to cover the cost could give a warning, because the turn ends automatically otherwise and info is revealed.
#341
General Discussion / Re: Stuck with lower rating
28 April 2017, 05:40:18 PM
Quote from: jeebus on 28 April 2017, 04:53:02 PM
Is it plus/minus 30 just like the default when you start a table?
It is about maximum 14.5 levels difference now. If both players have the same phi, that means a maximum difference for mu of about 2. And that means a maximum implied winning probability of about 88%.
Things can get more unequal, if the better player has played less (high phi) than the worse player (low phi).
#342
Instead of clicking "Login", you should try "Kick & Resign" below. That should get rid of the dead game.

In terms of not getting to the result screen, it seemed that this problem existed on Firefox for me but not on Chrome - and also only in some places (Windows 10).
#343
General Discussion / Re: Stuck with lower rating
28 April 2017, 09:52:11 AM
I agree that better matching (in the sense of more equal opponents) would be good, because closer games are more enjoyable. And I can personally take away more from a loss when my opponent does something reasonable than when he opens Treasure Map/Silver and somehow gets lucky.

But even within the current bounds for an "equal opponent", it's not a problem for the rating system.
#344
General Discussion / Re: Stuck with lower rating
28 April 2017, 12:35:12 AM
Playing a lot of matches matters in so far at the top as this reduces phi (a 0.1 difference in phi gives 1.5 levels). But that is not the main reason you're further down.

Also, in contrast to Isotropish, phi increases quickly again if you stop playing, or play less than before. So that effect on the level is more transitory than getting a high mu.
#345
General Discussion / Re: Stuck with lower rating
28 April 2017, 12:25:22 AM
You might want to check http://dominion.lauxnet.com/scavenger/?user=jeebus

What matters for rising is that your actual wins exceed the expected wins. Then, rising can actually go quite fast - and you would only be "stuck" with a rating, if your phi is very small (but that is not happening with the currently chosen sigma).

What makes it harder at the top, is that the matched opponents are weak on average - in your whole history you should win 3/4 games. Only if you manage to win more, you'll rise. And losing a game to a weak opponent due to computer crash hurts a lot.

As a rule of thumb, your mu changes by phi^2*(wins-exp.wins). In your case today (after 7-1 games), that would be 0.35^2 * 1.3=0.16.
1 point in mu corresponds to about 20 levels in the Isotropish leaderboard, or 1 level now is more than 2 levels in Isotropish. So gaining "1 level" is more significant than it used to be.
Still, given your mu>60 in Isotropish, you should probably get to at least mu=2 here.