Scoring Forecast models¶
This notebook explains how the Scorer class can be used to compute the following metrics:
- MAE (mean absolute error);
- MSE (mean squared error);
- CRPS (Continuous Ranked Probability Score);
- Log Score.
- Interval Score
- Weighted interval score (wis)
The CRPS and Log Score parameterize each prediction as a log-normal distribution. Details about how the log-normal parameters are computed are available here.
To see the plots below as interactive charts use alt.renderers.enable("default")
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("API_KEY")
import altair as alt
alt.renderers.enable("png")
RendererRegistry.enable('png')
import numpy as np
import pandas as pd
from mosqlient import get_infodengue
from mosqlient.scoring import Scorer
To show the use of the class to compare the score of new predictions with the predictions registered on the platform it will be used the baseline model.
from datetime import date
from mosqlient.forecast import Arima
It will compare the predictions of id 77 and 78, which refer to the geocode 3304557, between 2022-01-02 and 2023-06-25.
So initially, we will get the data that will be used to train the model and generate the new predictions:
disease = 'dengue'
geocode = 3304557
end_date = date.today().strftime('%Y-%m-%d')
df = get_infodengue(api_key=api_key,
disease = 'dengue',
start_date = '2010-01-01',
end_date = '2023-12-31',
geocode = 3304557)
df['data_iniSE'] = pd.to_datetime(df['data_iniSE'])
df.set_index('data_iniSE', inplace = True )
df = df[['casos']].rename(columns = {'casos':'y'})
df = df.resample('W-SUN').sum()
df.head()
100%|██████████| 2/2 [00:01<00:00, 1.78requests/s]
| y | |
|---|---|
| data_iniSE | |
| 2010-01-03 | 30 |
| 2010-01-10 | 44 |
| 2010-01-17 | 46 |
| 2010-01-24 | 47 |
| 2010-01-31 | 68 |
Create the model:
m_arima = Arima(df = df)
m_arima
<mosqlient.forecast.baseline.Arima at 0x7937cfcd9a90>
Train the model:
model = m_arima.train(train_ini_date='2010-01-01', train_end_date = '2021-12-31')
Performing stepwise search to minimize aic ARIMA(2,1,2)(0,0,0)[0] intercept : AIC=-292.056, Time=0.15 sec ARIMA(0,1,0)(0,0,0)[0] intercept : AIC=-263.166, Time=0.05 sec ARIMA(1,1,0)(0,0,0)[0] intercept : AIC=-297.770, Time=0.03 sec ARIMA(0,1,1)(0,0,0)[0] intercept : AIC=-289.846, Time=0.06 sec ARIMA(0,1,0)(0,0,0)[0] : AIC=-265.069, Time=0.03 sec ARIMA(2,1,0)(0,0,0)[0] intercept : AIC=-302.929, Time=0.06 sec ARIMA(3,1,0)(0,0,0)[0] intercept : AIC=-302.144, Time=0.06 sec ARIMA(2,1,1)(0,0,0)[0] intercept : AIC=-309.026, Time=0.17 sec ARIMA(1,1,1)(0,0,0)[0] intercept : AIC=-301.338, Time=0.08 sec ARIMA(3,1,1)(0,0,0)[0] intercept : AIC=-307.027, Time=0.25 sec ARIMA(1,1,2)(0,0,0)[0] intercept : AIC=-305.684, Time=0.19 sec ARIMA(3,1,2)(0,0,0)[0] intercept : AIC=-306.429, Time=0.32 sec ARIMA(2,1,1)(0,0,0)[0] : AIC=-310.932, Time=0.05 sec ARIMA(1,1,1)(0,0,0)[0] : AIC=-303.191, Time=0.05 sec ARIMA(2,1,0)(0,0,0)[0] : AIC=-304.794, Time=0.09 sec ARIMA(3,1,1)(0,0,0)[0] : AIC=-308.934, Time=0.08 sec ARIMA(2,1,2)(0,0,0)[0] : AIC=inf, Time=0.19 sec ARIMA(1,1,0)(0,0,0)[0] : AIC=-299.603, Time=0.02 sec ARIMA(1,1,2)(0,0,0)[0] : AIC=-307.585, Time=0.09 sec ARIMA(3,1,0)(0,0,0)[0] : AIC=-304.014, Time=0.03 sec ARIMA(3,1,2)(0,0,0)[0] : AIC=-308.337, Time=0.11 sec Best model: ARIMA(2,1,1)(0,0,0)[0] Total fit time: 2.174 seconds
Generate the out-of-sample predictions:
df_out = m_arima.predict_out_of_sample(horizon = 4, end_date = '2023-06-25', plot = False)
df_out.head()
| lower_95 | upper_95 | lower_90 | upper_90 | lower_80 | upper_80 | lower_50 | upper_50 | pred | date | data | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2.380135 | 5.611532 | 2.541974 | 5.220175 | 2.744047 | 4.806410 | 3.123123 | 4.194429 | 3.614649 | 2022-01-02 | 21.0 |
| 1 | 2.039724 | 5.982174 | 2.213044 | 5.457454 | 2.433656 | 4.915421 | 2.859450 | 4.139010 | 3.433217 | 2022-01-09 | 14.0 |
| 2 | 1.717638 | 6.491374 | 1.897142 | 5.785903 | 2.130666 | 5.076974 | 2.596284 | 4.099097 | 3.252114 | 2022-01-16 | 22.0 |
| 3 | 1.490893 | 7.007500 | 1.671502 | 6.119112 | 1.910814 | 5.247761 | 2.401294 | 4.084722 | 3.118683 | 2022-01-23 | 19.0 |
| 0 | 36.296197 | 129.720212 | 39.933812 | 116.222528 | 44.643670 | 102.575269 | 53.967930 | 83.592642 | 66.974354 | 2022-01-30 | 31.0 |
To compare the model with other predictions in the platform, it is necessary to provide the ids of the predictions, the dataframe of the new prediction with the columns date, pred, lower and upper, and the dataframe with the true values to be compared (df_true). This dataframe must contain the columns date and casos.
Redefine the data to compare the predictions:
data = df.reset_index()
data = data.rename(columns = {'data_iniSE': 'date', 'y':'casos'})
data.head()
| date | casos | |
|---|---|---|
| 0 | 2010-01-03 | 30.0 |
| 1 | 2010-01-10 | 44.0 |
| 2 | 2010-01-17 | 46.0 |
| 3 | 2010-01-24 | 47.0 |
| 4 | 2010-01-31 | 68.0 |
import mosqlient as mosq t = mosq.get_prediction_by_id(api_key=api_key, id=1087) t.to_dataframe()
%%time
score = Scorer(
api_key=api_key,
df_true=data,
ids=[1087],
pred=df_out,
)
CPU times: user 625 ms, sys: 8.02 ms, total: 633 ms Wall time: 4.06 s
The class above can be initialized with just the ids or pred parameter filled.
To see the MAE error for your model (key = pred) and the other use:
score.mae
{'pred': 129.92568967185332, '1087': 615.9091087085636}
To plot a bar chart:
score.plot_mae()
To see the MSE error for your model (key = pred) and the other use:
score.mse
{'pred': 41097.5227557785, '1087': 734525.5302135687}
To plot a bar chart:
score.plot_mse()
To see the CRPS score for your model (key = pred) and the other, use the code below. The first dict contains the score by each point, and the second one shows the mean of the score.
The CRPS score is computed using the formula below when dist='log_normal', then:
$$
\operatorname{CRPS}(\operatorname{LN}(\mu_i, \sigma_i), y_i) = y_i \left[ 2\Phi(y_i) - 1 \right] - 2\exp\left(\mu_i + \frac{\sigma_i^2}{2}\right) \left[ \Phi(\omega_i - \sigma_i) + \Phi\left(\frac{\sigma_i} {\sqrt{2}}\right) \right],
$$
where $\omega_i = \frac{\operatorname{log}y_i - \mu_i}{\sigma_i}$, in which $y_i$ refers to the cases observed in week $i$, $\mu_i$ is the forecast for week $i$ and $\sigma_i$ is the standard deviation of the forecast in week $i$.
To access the CRPS score use:
score.crps
({'pred': date
2022-10-09 12.824913
2022-10-16 17.968062
2022-10-23 18.934474
2022-10-30 23.340983
2022-11-06 64.025511
2022-11-13 26.025794
2022-11-20 11.524928
2022-11-27 12.545239
2022-12-04 37.302721
2022-12-11 27.401781
2022-12-18 34.742208
2022-12-25 28.648640
2023-01-01 53.749019
2023-01-08 75.912627
2023-01-15 74.952830
2023-01-22 71.067227
2023-01-29 75.661439
2023-02-05 67.915206
2023-02-12 128.651877
2023-02-19 115.924450
2023-02-26 385.320567
2023-03-05 429.230153
2023-03-12 439.599199
2023-03-19 469.786637
2023-03-26 194.497343
2023-04-02 291.302903
2023-04-09 340.958328
2023-04-16 464.517550
2023-04-23 240.054338
2023-04-30 280.421580
2023-05-07 372.901456
2023-05-14 475.294996
2023-05-21 158.646668
2023-05-28 194.717756
2023-06-04 257.840314
2023-06-11 306.453122
2023-06-18 278.878464
2023-06-25 148.616863
dtype: float64,
'1087': date
2022-10-09 122.928540
2022-10-16 157.128887
2022-10-23 105.394784
2022-10-30 16.183432
2022-11-06 60.799867
2022-11-13 27.547187
2022-11-20 90.372713
2022-11-27 141.283674
2022-12-04 142.760249
2022-12-11 118.622987
2022-12-18 76.834504
2022-12-25 37.037587
2023-01-01 3.640307
2023-01-08 139.797746
2023-01-15 187.328836
2023-01-22 97.553246
2023-01-29 101.146750
2023-02-05 17.767231
2023-02-12 158.715423
2023-02-19 52.664425
2023-02-26 371.737816
2023-03-05 458.311154
2023-03-12 694.875100
2023-03-19 964.551203
2023-03-26 1280.654963
2023-04-02 1077.862762
2023-04-09 1497.116676
2023-04-16 1279.683129
2023-04-23 1369.273390
2023-04-30 1740.582840
2023-05-07 1655.467209
2023-05-14 1585.906214
2023-05-21 1356.085433
2023-05-28 1218.369286
2023-06-04 1138.610388
2023-06-11 1282.560595
2023-06-18 1158.720704
2023-06-25 927.512307
dtype: float64},
{'pred': np.float64(176.53047804024365),
'1087': np.float64(602.9839354033297)})
To plot the score you can use:
score.plot_crps()
To see the Log score for your model (key = pred) and the other, use the code below. The first dict contains the score by each point, and the second one shows the mean of the score.
The Log score is computed using the formula below when dist='log_normal', then:
$$
\operatorname{LogS}(\operatorname{LN}(\mu_i, \sigma_i), y_i) = \log y_i + \log \sigma_i + \frac{1}{2} \log (2\pi) \\ + \frac{(\log y_i - \mu_i)^2}{2\sigma_i^2}. \nonumber
$$
To access the score use:
score.log_score
({'pred': date
2022-10-09 -4.726129
2022-10-16 -5.253850
2022-10-23 -5.267267
2022-10-30 -5.371019
2022-11-06 -7.092913
2022-11-13 -5.418675
2022-11-20 -4.837875
2022-11-27 -4.879481
2022-12-04 -5.305424
2022-12-11 -5.145771
2022-12-18 -5.313283
2022-12-25 -5.605567
2023-01-01 -5.736034
2023-01-08 -5.981004
2023-01-15 -6.106720
2023-01-22 -6.365071
2023-01-29 -6.175830
2023-02-05 -6.540963
2023-02-12 -6.594165
2023-02-19 -6.893570
2023-02-26 -7.549482
2023-03-05 -7.639048
2023-03-12 -7.809768
2023-03-19 -8.072596
2023-03-26 -7.535772
2023-04-02 -7.723401
2023-04-09 -8.117515
2023-04-16 -8.137527
2023-04-23 -7.581368
2023-04-30 -7.982317
2023-05-07 -8.140518
2023-05-14 -8.265051
2023-05-21 -7.453745
2023-05-28 -7.573456
2023-06-04 -7.726165
2023-06-11 -7.958195
2023-06-18 -7.798377
2023-06-25 -7.337817
dtype: float64,
'1087': date
2022-10-09 -100.000000
2022-10-16 -19.593705
2022-10-23 -17.156070
2022-10-30 -4.833135
2022-11-06 -8.321957
2022-11-13 -5.236594
2022-11-20 -19.428080
2022-11-27 -44.565292
2022-12-04 -25.792618
2022-12-11 -44.683243
2022-12-18 -22.798509
2022-12-25 -6.336284
2023-01-01 -3.640273
2023-01-08 -37.031695
2023-01-15 -48.397196
2023-01-22 -11.755754
2023-01-29 -12.795067
2023-02-05 -4.965794
2023-02-12 -27.163728
2023-02-19 -6.837575
2023-02-26 -54.529135
2023-03-05 -37.206181
2023-03-12 -71.426071
2023-03-19 -100.000000
2023-03-26 -100.000000
2023-04-02 -90.025644
2023-04-09 -100.000000
2023-04-16 -100.000000
2023-04-23 -100.000000
2023-04-30 -100.000000
2023-05-07 -100.000000
2023-05-14 -100.000000
2023-05-21 -100.000000
2023-05-28 -85.910048
2023-06-04 -100.000000
2023-06-11 -100.000000
2023-06-18 -100.000000
2023-06-25 -100.000000
dtype: float64},
{'pred': np.float64(-6.7108613693534975),
'1087': np.float64(-55.53762230565999)})
To plot this score:
score.plot_log_score()
To see the Interval score for your model (key = pred) and the other, use the code below. The first dict contains the score by each point, and the second one shows the mean of the score.
The Interval score is computed using the formula below:
$$ S^{int}_\alpha(l_i, u_i; y_i) = u_i - l_i + \cfrac{2}{\alpha}(l_i - y_i)I\{y_i < l_i\} + \cfrac{2}{\alpha}(y_i - u_i)I\{y_i > u_i\} $$
where $I$ is the indicator function, $\alpha$ the significance level of the interval, $u_i$ the upper value of the interval at week $i$ and $l_i$ the lower value.
To access the score use:
score.interval_score
({'pred': date
2022-10-09 923.241265
2022-10-16 1878.008815
2022-10-23 1644.530902
2022-10-30 1625.697892
2022-11-06 2417.187072
2022-11-13 1732.150236
2022-11-20 1160.923642
2022-11-27 1096.433919
2022-12-04 220.439307
2022-12-11 903.874574
2022-12-18 1001.645377
2022-12-25 2077.079350
2023-01-01 1242.102005
2023-01-08 1374.393674
2023-01-15 2344.654274
2023-01-22 3966.181015
2023-01-29 2528.934852
2023-02-05 5967.902669
2023-02-12 3586.822465
2023-02-19 6772.323339
2023-02-26 3588.503469
2023-03-05 6825.030504
2023-03-12 12246.957079
2023-03-19 18727.662898
2023-03-26 15923.606674
2023-04-02 15437.223105
2023-04-09 24869.754193
2023-04-16 20874.818525
2023-04-23 14448.943543
2023-04-30 24366.355724
2023-05-07 24632.452339
2023-05-14 24922.179653
2023-05-21 16793.401373
2023-05-28 15984.733347
2023-06-04 16142.772736
2023-06-11 19718.037105
2023-06-18 16975.379835
2023-06-25 13622.130894
dtype: float64,
'1087': date
2022-10-09 2460.000000
2022-10-16 3200.000000
2022-10-23 2453.018211
2022-10-30 1141.655159
2022-11-06 2145.199097
2022-11-13 87.534089
2022-11-20 1138.189569
2022-11-27 1860.855910
2022-12-04 1552.824911
2022-12-11 1430.841336
2022-12-18 1093.866721
2022-12-25 269.742994
2023-01-01 757.446619
2023-01-08 2072.897684
2023-01-15 2996.128771
2023-01-22 1251.874514
2023-01-29 1268.268822
2023-02-05 1858.860909
2023-02-12 2227.175310
2023-02-19 2396.444435
2023-02-26 9084.735014
2023-03-05 10638.192606
2023-03-12 15369.965961
2023-03-19 20825.364979
2023-03-26 27023.554078
2023-04-02 22944.582973
2023-04-09 31359.046605
2023-04-16 26687.779439
2023-04-23 28422.474098
2023-04-30 36043.345325
2023-05-07 34044.664859
2023-05-14 32285.035904
2023-05-21 27835.751033
2023-05-28 25252.096935
2023-06-04 23443.350206
2023-06-11 26204.903227
2023-06-18 23735.375054
2023-06-25 19142.480954
dtype: float64},
{'pred': np.float64(9225.380780004474),
'1087': np.float64(12473.829587120174)})
To visualise a plot with the score use:
score.plot_interval_score()
The weighted interval score (WIS) is compute using the equation below: $$ \text{WIS}(F, y) = \frac{1}{K + 1/2} \left( w_0|y - m| + \sum_{k=1}^K [w_K S^{int}_{\alpha_k} (l_K, u_K; y) ]\right), $$
by default, ( w_k = \frac{\alpha_k}{2} ) and ( w_0 = \frac{1}{2} ). In this equation, ( K ) denotes the number of intervals, and ( l_k ) and ( u_k ) represent the lower and upper bounds of the ( k )-th confidence interval, respectively. The implementation defines the ( \alpha_k ) values based on the names of the prediction columns.
score.wis
({'pred': date
2022-10-09 10.619696
2022-10-16 15.243582
2022-10-23 16.451777
2022-10-30 20.004967
2022-11-06 43.851187
2022-11-13 16.347491
2022-11-20 10.354658
2022-11-27 11.300675
2022-12-04 22.053271
2022-12-11 19.142186
2022-12-18 24.507445
2022-12-25 24.421750
2023-01-01 34.964983
2023-01-08 49.578230
2023-01-15 55.280560
2023-01-22 61.302048
2023-01-29 51.917378
2023-02-05 58.195013
2023-02-12 93.456592
2023-02-19 100.924584
2023-02-26 238.706284
2023-03-05 283.653105
2023-03-12 333.226960
2023-03-19 405.351061
2023-03-26 164.606193
2023-04-02 242.191037
2023-04-09 307.261889
2023-04-16 407.471574
2023-04-23 197.486172
2023-04-30 251.763673
2023-05-07 321.990235
2023-05-14 421.634808
2023-05-21 140.915998
2023-05-28 166.933728
2023-06-04 223.211572
2023-06-11 277.237545
2023-06-18 168.646092
2023-06-25 113.664595
dtype: float64,
'1087': date
2022-10-09 122.936667
2022-10-16 156.346565
2022-10-23 100.220575
2022-10-30 11.503594
2022-11-06 50.691970
2022-11-13 19.850031
2022-11-20 75.531717
2022-11-27 117.499088
2022-12-04 110.764907
2022-12-11 93.798040
2022-12-18 67.302732
2022-12-25 28.410710
2023-01-01 2.893301
2023-01-08 122.987177
2023-01-15 170.780828
2023-01-22 84.101707
2023-01-29 85.466473
2023-02-05 14.533089
2023-02-12 136.613243
2023-02-19 38.907105
2023-02-26 359.637551
2023-03-05 438.716546
2023-03-12 678.179959
2023-03-19 948.085210
2023-03-26 1264.948317
2023-04-02 1060.228668
2023-04-09 1482.110349
2023-04-16 1269.453166
2023-04-23 1359.683300
2023-04-30 1731.608802
2023-05-07 1648.215017
2023-05-14 1581.431792
2023-05-21 1350.354312
2023-05-28 1208.123399
2023-06-04 1132.963164
2023-06-11 1277.512309
2023-06-18 1153.506827
2023-06-25 922.194125
dtype: float64},
{'pred': np.float64(142.25975248568696),
'1087': np.float64(591.5287455391691)})
To visualise a plot with the score use:
score.plot_wis()
To generate a table with a summary of the scores, you can use the following:
score.summary
| mae | mse | crps | log_score | interval_score | wis | |
|---|---|---|---|---|---|---|
| id | ||||||
| pred | 129.925690 | 41097.522756 | 176.530478 | -6.710861 | 9225.380780 | 142.259752 |
| 1087 | 615.909109 | 734525.530214 | 602.983935 | -55.537622 | 12473.829587 | 591.528746 |
To generate a plot of the predictions, you can use the following:
score.plot_predictions()
The class will select the bigger range of dates that contains information about each prediction dataframe. If you want to see the model's performance in a lower range, use the method below. The new range of dates provided must be between score.min_date and score.max_date. Otherwise, it will return an error.
score.set_date_range(start_date = '2023-01-01', end_date='2023-06-01' )
In this case:
score.summary
| mae | mse | crps | log_score | interval_score | wis | |
|---|---|---|---|---|---|---|
| id | ||||||
| pred | 170.317371 | 5.630798e+04 | 245.503825 | -7.271587 | 12155.678942 | 199.491459 |
| 1087 | 802.048697 | 1.025145e+06 | 786.776870 | -63.258371 | 16483.894858 | 774.411787 |
score.plot_log_score()
score.plot_predictions()