General discussions: Elo Rating problem is ready for testing

Rodion (admin) 2024-05-24 07:51:38

Hi Friends and please sorry for delay - our colleague Kevin had sent me this problem about week or more ago.

It seems pretty straightforward but may need some attention to math (property I generally lack) and also there were minor code modifications to the checker - so I gladly invite you to verify it please!

Elo Ratings

gardengnome 2024-05-24 10:55:54

In larger examples, there are many interim elo values that are very similar - with deltas below 1e-12 - so the order of the players depends on marginal differences which of course has a big impact on the final result. Maybe rounding elo values after each round, or to x decimal places, could help.

CSFPython 2024-05-24 12:47:54

Kevin,

Thanks for the interesting problem and for the interesting description of the calculations used for the Elo system.

I agree entirely with Matthias. The fact that every initial Elo value is the same means that there are far too many comparisons of supposedly identical real numbers as the rounds progress. The orders of the players cannot be guaranteed in this way. I think that the best way to avoid this is to start with different Elo values for each of the players. This is easily done by using the random generator. For example, if the random value is R we could use this to define an Elo value of E = 900 + R%600 which clearly gives a random value between 900 and 1599. These numbers are merely an illustration for the general idea. The main thing is to avoid getting many duplicate Elo values at the start. Another possibility would be to have the Elo values increasing by 1 for each successive player, e.g. 900, 901, 902 ...

There are two other minor observations:

In the final line of the problem description Player 1 loses; otherwise the numbers are correct.

In the line above the PointsTransferred equation, the points are transferred from Player B to Player A (and not from A to B as stated). For example, if Player A wins, the Result is 1 so the points transferred will be positive; so these must then be transferred from player B to player A.

CSFPython 2024-05-24 13:02:48

I have had further thoughts about this problem. Given the number of competitors and rounds that are being used, there is still too much scope for clashes of supposedly identical real numbers, using either of the suggestions that I made in the previous post. I think a solution might be to start every Elo value as a distinct real number, avoiding integers altogether. If we generate 3 random numbers (r1, r2 and r3), using the generator, we could use a formula similar to the following:

E = 900 + r1%600 + r2/r3  where the division is real (not integer)

Once r2 and r3 are generated, it might be a good idea to swap them over if r2 > r3. This is probably unnecessary but there are some circumstances where r2/r3 could be quite large!

moxieman 2024-05-24 19:17:40

Thank you Rodion for consideration of the problem, and thanks to all for the thoughts and review!

I see what you both mean, that the comparison very similar Elo values could have large impact on the final result. In my opinion the most straightforward approach would be to round the resulting ratings to the nearest 0.01 after every game. I thought I had read somewhere that real-life organizations rounded to 0.01 after real games, but I'm having trouble finding where I saw that now...

The only possible issue that I could see coming along with rounding would be if we ever expected some post-game rating to have a decimal tail ending 0.##5, in which case we'd have to specify type of rounding (e.g. bankers, hafz, etc). But I actually don't that could occur given two input ratings rounded to 0.01...?

If rounding to 0.01 is the path forward, then maybe it would be more straightforward to request the answer values to be rounded to 0.01 as well, instead of to full integers?

Also, I agree with gardengnome's other two observations above regarding typos in the original problem - it should be "transferred from Player B to Player A" and also "Player 0 wins and and Player 1 loses".

CSFPython 2024-05-24 19:34:29

Kevin, Rounding to 2 decimal places should work. The scenario you mention above is unlikely to happen. If you make K a little less round (e.g. 17) the likelihood becomes even smaller. With the large number of competitors the numbers of identical Elo values will remain quite large for several rounds. But I assume that this is intentional.

A question occurs to me. Will 2 equal real numbers which are rounded to 2 decimal places be regarded as identical in all systems or could comparisons place them in arbitrary orders?

zelevin 2024-05-24 19:50:06

I think the easiest approach would be to round to the integer after every round, and break the ordering ties in the beginning of every round by ID (which is guaranteed to be unique).

CSFPython 2024-05-24 19:54:12

Vladimir, I like that idea, especial as it is so simple.

moxieman 2024-05-24 21:07:02

I like the idea of rounding to integers after every round as well!

The identical Elo pairings wasn't really intentional for any reason other than I wasn't trying to avoid it. It was somewhat satisfying to see the rankings start to line up with the assigned skill levels over semi-random simulation. I was also trying to keep the testcases within a range that likely avoids negative Elo ratings (not that there's anything explicitly prohibiting negative ratings), and more random initial Elo ratings might make that less predictable...

As for the tie-breaking, the checker is already ensuring that the LCG seed given won't generate any duplicate skill values across the range of players, so those are guaranteed to be unique for the purpose of breaking ties :)

Rodion (admin) 2024-05-25 05:32:23

Hi Friends!

Now the problem is updated with the code and text amendments that Kevin had sent to me some hours ago. Let's see if it works and if it is enough. As for tie-breaking and Kevin's explanations - should this be added to the problem statement (as probably this may be not obvious?)

gardengnome 2024-05-25 06:06:33

Test passed, which means the results are becoming more reproducible which is good, many thanks.

Just one more observation: the two statements "System is symmetrical in that one player wins or loses the same amount of points that the other loses or wins" and "Each player's Elo Ratings are updated accordingly and rounded off to the nearest integer at the end of each Round" are not fully compatible anymore. Should the player's rating be rounded, or better the transferred points? Imagine the points transferred being 10.5 ... currently one player gains 11 points while the other one loses 10 points after rounding.

CSFPython 2024-05-25 10:48:00

Mathias, I agree with your concern over the lack of symmetry for the 0.5 case. However, the points transferred are as often negative as positive. There would probably need to be some agreement on how to do the rounding; for example using the absolute value. This could lead to confusion so is probably best avoided. We can remove the problem altogether by changing the value of K to 17, 19 or 23. The 0.5 case will then not occur so there would be no need for any kind of correction.

ecolog_veteran 2024-06-01 13:59:18

I noticed that in the problem description where game rounds are explained there are a few typos: in the first two rounds some players end up with rating of 2010 but it is not possible to jump from 1200 to 2010 at once since the maximum amount of points to be transfered is the value of K by absolute value.

Plus it would be better to highlight that ratings are rounded to the nearest integer at the end of each round.