I'd like to remark that the level, as displayed in the game client, is kind of a strange beast. For example, when looking at http://dominion.lauxnet.com/leaderboard/?full=true
at this moment, my level is 46.06 and on the same page it shows players with levels from 46.00 to 46.10. I guess most people would intuitively say that all those players have the same strength and the difference is negligible.
But if you look at the μ-value of those players (which represents the (unscaled) strength of the players as estimated by the ratings algorithm, see http://forum.shuffleit.nl/index.php?topic=1679.msg5891
for details), my μ-value is 0.06, but there are 3 μ-values above 1 and the lowest is -0.17 (all from players with levels between 46.00 and 46.10).
If we use the scaling factor of 7.5 as detailed in the above mentioned post, the difference between -0.17 and 1.31 (the highest value) in μ-value is actually 11.1 levels as expressed in the game client. Which is roughly 100 times more than the displayed range of 46.00 - 46.10 would suggest.
The reason is that the level is calculated not just from μ, but also from φ, an estimated accuracy of the estimate μ. Basically (and for details I again refer to the above mentioned post by Stef) the lesser number of games you have played, the higher the perceived inaccuracy φ. Conversely, when you have played many (hundreds) of games, your φ becomes small.
Since the level is calculated as 50 + 7.5 * (μ-2φ), it will show a player with μ=1.20 φ=0.86 (13 games played) as having the same level (46.06) as myself, with μ=0.06 φ=0.29 (79 played games).
The problem I see is that that person with μ=1.20 is *clearly* better than me. That the rating algorithm is more unsure of the exact strength of that person is clear, 13 games is not much to go by, but putting it on my level is (at least to me) obviously wrong.
The stated reason for using a scaled version of (μ-2φ) is (quoting from the above post)
Glicko2 claims it's 95% certain your actual skill is between (μ-2φ) and (μ+2φ), and suggests using (μ-2φ) for rating / leaderboard, so we're doing that.
I understand that if you have some entity that objectively exists and you want to estimate if some person meets a certain quality level for that entity you want to err on the side of caution. In that case using a 2.5% of error quantile is reasonable.
But while there is obviously a skill cascade involved with Dominion, I think that the concept of a 'pre-existing' numerical skill level that the Glicko2 rating is estimating is not valid. So I think that using (μ-2φ) is a bad band-aid that declares players of widely differing skills as "equal levels".