Numerical ratings example using Least Squares

The Latest 400 or so Topics

Moderators: Mitch Hawker, east hockey, karl(east)

Post Reply
LSQRANK
Posts: 415
Joined: Fri Nov 21, 2014 6:05 pm

Numerical ratings example using Least Squares

Post by LSQRANK » Sat Jan 01, 2022 2:02 pm

From time to time each season, I get questions about how the least-squares numerical model works when applied to rating the boys and girls HS ice hockey teams. Below I provide a small example of how the data is processed and what the results means in terms of something substantial to the average hockey fan. Currently, I am using the "raw" game score differential as the observation value because it is the easiest and quickest to collect and process the game score data. If one wanted to eat up more time and effort to achieve a possibly better rating profile, one could modify the raw game score by accounting for several factors: Empty Net Goals, Goals not scored due to Running Time protocols, power-play goals, shots differentials, effects of home vs away, distance traveled, rest days between games, etc, etc.

-enjoy Doc

Code: Select all

This problem set consists of actual data played in Nov & Dec of 2021 in the MN State HS league.
Can you guess who the 6 teams actually are?  (see below)

Supose that there exists six teams (A,B,C,D,E,F)
that play 12 games between each other, such that,

game #1    Team_A     Team_B       4    3   ;    Team_A beats Team_B by 1 goal
game #2    Team_B     Team_C       4    2   ;    Team_B beats Team_C by 2 goals
game #3    Team_C     Team_D       4    1   ;    Team_C beats Team_D by 3 goals
game #4    Team_D     Team_A       6    0   ;    Team_D beats Team_A by 6 goals
game #5    Team_E     Team_B       3    5   ;    Team_B beat  Team_E by 2 goals
game #6    Team_E     Team_D       5    3   ;    Team_E beats Team_D by 2 goals
game #7    Team_E     Team_C       5    6   ;    Team_C beats Team_E by 1 goal
game #8    Team_F     Team_B       1    3   ;    Team_B beats Team_F by 2 goals
game #9    Team_F     Team_D       2    3   ;    Team_D beats Team_F by 1 goal
game #10   Team_F     Team_A       3    2   ;    Team_F beats Team_A by 1 goal
game #11   Team_F     Team_C       0    3   ;    Team_C beats Team_F by 3 goals
game #12   Team_A     Team_B       5    4   ;    Team_A beats Team_B by 1 goal

Let the ratings value for Team_A = 1000.000

Relative to Team_A, 
calculate the rating values for the 5 other teams (B,C,D,E,F)
such that the sum of the square of the observation residuals are minimized.

If one chose the following rating values, 

Team_A = 1000    (given)
Team_B = 1003
Team_C = 1003.5
Team_D = 1002
Team_E = 1002.5
Team_F = 1001

they would be close. What are the residual values in this case?

The residual value is denoted by little "v", and is calculated as

v = Observed_Value  minus  Predicted_Value

Game#   H - A = Observed    Away   -  Home    = Predicted     v      v_squared
=====   =   =   ========    ====      ====      =========     ===    =========
 1      4   3     1         1000      1003         -3.0       4.0       16.00
 2      4   2     2         1003      1003.5       -0.5       2.5        6.25
 3      4   1     3         1003.5    1002          1.5       1.5        2.25
 4      6   0     6         1002      1000          1.0       5.0       25.00
 5      3   5    -2         1002.5    1003         -0.5      -1.5        2.25
 6      5   3     2         1002.5    1002          0.5       1.5        2.25
 7      5   6    -1         1002.5    1003.5       -1.0       0.0        0.00
 8      1   3    -2         1001      1003         -2.0       0.0        0.00
 9      2   3    -1         1001      1002         -1.0       0.0        0.00
10      3   2     1         1001      1000          1.0       0.0        0.00
11      0   3    -3         1001      1003.5       -2.5      -0.5        0.25
12      5   4     1         1000      1003         -3.0       4.0       16.00
                                                                        _____
                                                                sum =   70.25

Using numerical matrix methods, one can solve for the minimized sum of the square of the residuals,
,i.e. the term "least squares solution" gets its name.

The actual least squares solution to this particular problem set is:

LSQRANK   State Tier Rating Score    Team
------------------------------------------------------------------------------ 
     1   MN     HS     1003.3750    Team_C   MSHSL   0   xx      4  ( 3- 1- 0) 
     2   MN     HS     1002.9375    Team_B   MSHSL   0   xx      5  ( 3- 2- 0) 
     3   MN     HS     1002.5000    Team_E   MSHSL   0   xx      3  ( 1- 2- 0) 
     4   MN     HS     1002.1875    Team_D   MSHSL   0   xx      4  ( 2- 2- 0) 
     5   MN     HS     1000.8750    Team_F   MSHSL   0   xx      4  ( 1- 3- 0) 
     6   MN     HS     1000.0000    Team_A   MSHSL   0   xx      4  ( 2- 2- 0) 

Avg V =    1.302
max V =    3.938
Snot  =    3.174

DOFs = 6   ( equals 12 games - 6 teams)  

To intreprete the LSQRank results:
If Team_D were to play Team_A in the future, 
the best predicted outcome would be that Team_D would beat Team_A by approximately 1 goal.

contrasting the least squares solution to the KRACH Algorithm results:

Rank [--Team-] State   League  KRACH_Rating   SOS    WL_Ratio  (WW-LL-TT)
---- ------------------------  ------------  ------  ---------------- ----
   1   Team_C    MN   MSHSL      208.29      69.43     3.00    ( 3- 1- 0)
   2   Team_B    MN   MSHSL      118.96      79.31     1.50    ( 3- 2- 0)
   3   Team_A    MN   MSHSL       79.14      79.14     1.00    ( 2- 2- 0)
   4   Team_D    MN   MSHSL       76.13      76.13     1.00    ( 2- 2- 0)
   5   Team_E    MN   MSHSL       60.03     120.06     0.50    ( 1- 2- 0)
   6   Team_F    MN   MSHSL       35.35     106.04     0.33    ( 1- 3- 0)

To intreprete the KRACH results:
If Team_D were to play a series of game against Team_A in the future, 
the best predicted outcome would be that each team would win 50% of the games,
i.e. approximately an even split.

As of Dec. 24th, 2021  then latest profile run contained the following:

2052 games played by 496 teams  =  1556 DOF's (degrees of freedom)

Break-down of teams included in the solution:

League       #teams   #classes
==========   ======   ============
MN varsity    148     2 (A & AA    )
MI varsity    135     3 (D1, D2, D3)
WI varsity     85     2 (D1 & D2)
ND varsity     20     1
SD varsity     11     1
MT varsity      8     1
MN JrGold      16
ND JrGold       9
MI JVPrep      41
Other          23     ; Other teams connected in the network







answer to Team ID's ( A=STA, B=CDH, C=Hermantown, D=HILL, E=BSM & F=Mahtomedi)


Post Reply