Ensemble predictions
In this notebook it will be provide examples about how the mosqlient package can be used to apply the ensemble methodologies proposed
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("API_KEY")
import numpy as np
import pandas as pd
import mosqlient as mosq
from mosqlient import get_prediction_by_id
from mosqlient.forecast import Ensemble
from mosqlient.prediction_optimize import get_df_pars
from mosqlient.forecast.viz import plot_preds
Get a prediction:
df = get_prediction_by_id(api_key=api_key, id=1087).to_dataframe()
df = df.dropna(axis =1)
df.head()
| date | lower_90 | pred | upper_90 | |
|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 |
Apply the log normal parametrization for the predictions:
In the cell below is used the log normal distribution minimized by the lower and upper values of the predictions.
df_pars = get_df_pars(df, dist = 'log_normal', fn_loss = 'lower', return_estimations=True)
df_pars.head()
| date | lower_90 | pred | upper_90 | mu | sigma | fit_med | fit_lwr | fit_upr | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | -2.878532 | 0.350151 | 0.056217 | 0.031604 | 0.100000 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 0.227887 | 0.926854 | 1.255944 | 0.273446 | 5.768582 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 3.313879 | 0.315959 | 27.491550 | 16.349090 | 46.227977 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 4.652431 | 0.212472 | 104.839540 | 73.917242 | 148.697771 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 4.532479 | 0.229604 | 92.988758 | 63.740045 | 135.658975 |
In the cell below is used the log normal distribution minimized by the median and upper values of the predictions.
df_pars = get_df_pars(df, dist = 'log_normal', fn_loss = 'median', return_estimations=True)
df_pars.head()
| date | lower_90 | pred | upper_90 | mu | sigma | fit_med | fit_lwr | fit_upr | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | -2.878532 | 0.350151 | 0.056217 | 0.031604 | 0.100000 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 0.227887 | 0.926854 | 1.255944 | 0.273446 | 5.768582 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 3.291018 | 0.329858 | 26.870205 | 15.618414 | 46.227994 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 4.644076 | 0.217551 | 103.967295 | 72.692403 | 148.697774 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 4.572626 | 0.205196 | 96.797994 | 69.069167 | 135.658965 |
In the cell below is used the normal distribution minimized by the median and upper values of the predictions.
df_pars = get_df_pars(df, dist = 'normal', fn_loss='median', return_estimations=True)
df_pars.head()
| date | lower_90 | pred | upper_90 | mu | sigma | fit_med | fit_lwr | fit_upr | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | 0.000000 | 0.060791 | 0.000000 | -0.099992 | 0.099992 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 0.000000 | 3.507061 | 0.000000 | -5.768603 | 5.768603 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 26.870205 | 11.768726 | 26.870205 | 7.512373 | 46.228037 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 103.967297 | 27.194206 | 103.967297 | 59.236809 | 148.697785 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 96.798063 | 23.625725 | 96.798063 | 57.937204 | 135.658923 |
In the cell below is used the normal distribution minimized by the lower and upper values of the predictions.
df_pars = get_df_pars(df, dist = 'normal', fn_loss='lower', return_estimations=True)
df_pars.head()
| date | lower_90 | pred | upper_90 | mu | sigma | fit_med | fit_lwr | fit_upr | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | 0.000000 | 0.060791 | 0.000000 | -0.099992 | 0.099992 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 0.000000 | 3.507061 | 0.000000 | -5.768603 | 5.768603 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 31.288492 | 9.082522 | 31.288492 | 16.349073 | 46.227911 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 111.307545 | 22.731671 | 111.307545 | 73.917274 | 148.697816 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 99.699502 | 21.861788 | 99.699502 | 63.740061 | 135.658942 |
Comparing the Ensemble techniques¶
Load the predictions that will be used to generate the ensemble
preds = []
for id in [1087, 5190]:
pred_ = get_prediction_by_id(api_key = api_key, id=id)
df_pred = pred_.to_dataframe()
df_pred = df_pred.dropna(axis = 1)
df_pred['model_id'] = pred_.model.id
preds.append(df_pred)
df_preds_end = pd.concat(preds)
df_preds_end.date = pd.to_datetime(df_preds_end.date)
df_preds_end.head()
| date | lower_90 | pred | upper_90 | model_id | lower_95 | lower_80 | lower_50 | upper_50 | upper_80 | upper_95 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
df_preds_end.model_id.unique()
array([ 2, 37])
Load the data on probable cases that will be used to minimize the ensemble weights:
data = mosq.get_infodengue(
api_key = api_key,
disease = "dengue",
start_date = "2023-10-08",
end_date = "2024-06-02",
uf = 'RJ')
data = data[['data_iniSE', 'casprov']].set_index('data_iniSE')
data.index = pd.to_datetime(data.index)
data = data.resample('W-SUN').sum()
data.head()
100%|███████████████████████████████████████████████████████████████| 10/10 [00:12<00:00, 1.29s/requests]
| casprov | |
|---|---|
| data_iniSE | |
| 2023-10-08 | 351 |
| 2023-10-15 | 451 |
| 2023-10-22 | 470 |
| 2023-10-29 | 519 |
| 2023-11-05 | 734 |
Plot the predictions versus the probable cases observed:
plot_preds(data, df_preds_end)
<Axes: xlabel='Date', ylabel='New cases'>
Apply the ensemble methodology¶
df_preds_end = df_preds_end.loc[df_preds_end.date <= '2024-06-02'].reset_index(drop=True)
Apply the linear mixture considering the same weights for all preds:
df_preds_end
| date | lower_90 | pred | upper_90 | model_id | lower_95 | lower_80 | lower_50 | upper_50 | upper_80 | upper_95 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 2022-11-13 | 133.376704 | 170.189145 | 229.452077 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 6 | 2022-11-20 | 150.909478 | 197.377438 | 244.897052 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 7 | 2022-11-27 | 179.042796 | 241.244720 | 290.712320 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | 2022-12-04 | 175.641246 | 259.566424 | 330.099290 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 9 | 2022-12-11 | 194.542067 | 252.070420 | 286.937727 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 10 | 2022-12-18 | 171.693336 | 203.195434 | 234.954249 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 11 | 2022-12-25 | 176.487150 | 212.922438 | 259.841086 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 12 | 2023-01-01 | 192.127669 | 227.419877 | 253.125454 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 13 | 2023-01-08 | 321.644884 | 370.603325 | 412.329270 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 14 | 2023-01-15 | 395.806439 | 448.519978 | 497.902737 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 15 | 2023-01-22 | 374.593726 | 427.719926 | 488.571152 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 16 | 2023-01-29 | 446.413441 | 501.437462 | 557.764181 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 17 | 2023-02-05 | 432.056955 | 495.397537 | 572.025012 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 18 | 2023-02-12 | 483.358766 | 547.369900 | 600.881742 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 19 | 2023-02-19 | 387.177778 | 436.779432 | 488.832355 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 20 | 2023-02-26 | 415.763249 | 481.257915 | 530.659530 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 21 | 2023-03-05 | 413.090370 | 461.358629 | 534.832998 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 22 | 2023-03-12 | 356.501702 | 408.128539 | 471.937586 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 23 | 2023-03-19 | 340.731751 | 395.592612 | 458.988728 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 24 | 2023-03-26 | 228.822296 | 278.537932 | 338.807836 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 25 | 2023-04-02 | 272.770851 | 319.162652 | 385.723294 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 26 | 2023-04-09 | 215.047670 | 265.869341 | 323.839392 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 27 | 2023-04-16 | 167.611028 | 208.444262 | 248.650073 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 28 | 2023-04-23 | 150.876295 | 189.695946 | 227.456065 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 29 | 2023-04-30 | 116.832734 | 165.707516 | 202.517266 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 30 | 2023-05-07 | 104.766757 | 141.384126 | 170.786393 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 31 | 2023-05-14 | 116.748205 | 138.878083 | 156.921958 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 32 | 2023-05-21 | 87.212448 | 114.945610 | 138.036952 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 33 | 2023-05-28 | 65.395153 | 96.019236 | 135.300553 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 34 | 2023-06-04 | 65.832490 | 91.582336 | 114.181539 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 35 | 2023-06-11 | 37.754839 | 58.521760 | 78.508332 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 36 | 2023-06-18 | 49.231247 | 70.169645 | 90.729868 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 37 | 2023-06-25 | 38.875952 | 61.181980 | 82.288447 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 38 | 2023-07-02 | 31.252156 | 46.885395 | 69.226474 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 39 | 2023-07-09 | 15.032011 | 33.220149 | 58.683160 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 40 | 2023-07-16 | 8.239270 | 22.962246 | 38.741835 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 41 | 2023-07-23 | 15.645944 | 31.645205 | 49.239203 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 42 | 2023-07-30 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 43 | 2023-08-06 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 44 | 2023-08-13 | 0.000000 | 0.000000 | 3.141718 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 45 | 2023-08-20 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 46 | 2023-08-27 | 1.889445 | 12.660959 | 34.473618 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 47 | 2023-09-03 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 48 | 2023-09-10 | 0.000000 | 0.000000 | 2.731096 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 49 | 2023-09-17 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 50 | 2023-09-24 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
| 51 | 2023-10-01 | 0.000000 | 0.000000 | 7.597729 | 2 | NaN | NaN | NaN | NaN | NaN | NaN |
df_ = get_df_pars(df_preds_end, conf_level=0.9, dist='log_normal', fn_loss='median')
df_.head()
| date | lower_90 | pred | upper_90 | model_id | lower_95 | lower_80 | lower_50 | upper_50 | upper_80 | upper_95 | mu | sigma | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | -2.878532 | 0.350151 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | 0.227887 | 0.926854 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | 3.291018 | 0.329858 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | 4.644076 | 0.217551 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | 4.572626 | 0.205196 |
df_["model_id"] = pd.Categorical(
df_["model_id"], categories=[22, 30, 34], ordered=True
)
df_.head()
/tmp/ipykernel_479457/79049107.py:1: Pandas4Warning: Constructing a Categorical with a dtype and values containing non-null entries not in that dtype's categories is deprecated and will raise in a future version. df_["model_id"] = pd.Categorical(
| date | lower_90 | pred | upper_90 | model_id | lower_95 | lower_80 | lower_50 | upper_50 | upper_80 | upper_95 | mu | sigma | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-10-09 | 0.000000 | 0.000000 | 0.100000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -2.878532 | 0.350151 |
| 1 | 2022-10-16 | 0.000000 | 0.000000 | 5.768582 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.227887 | 0.926854 |
| 2 | 2022-10-23 | 16.349089 | 26.870209 | 46.227978 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.291018 | 0.329858 |
| 3 | 2022-10-30 | 73.917242 | 103.967272 | 148.697770 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4.644076 | 0.217551 |
| 4 | 2022-11-06 | 63.740045 | 96.798032 | 135.658976 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4.572626 | 0.205196 |
e1 = Ensemble(df = df_preds_end,
order_models = [22, 30, 34],
dist = 'log_normal',
mixture = 'linear',
fn_loss = 'median')
preds_e1 = e1.apply_ensemble(weights = np.array([1/3,1/3,1/3]))
preds_e1['model_id'] = 'E1 - equal weights'
df_preds_ensemble = pd.concat([df_preds_end, preds_e1], axis = 0)
plot_preds(data, df_preds_ensemble)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[16], line 7 1 e1 = Ensemble(df = df_preds_end, 2 order_models = [22, 30, 34], 3 dist = 'log_normal', 4 mixture = 'linear', 5 fn_loss = 'median') ----> 7 preds_e1 = e1.apply_ensemble(weights = np.array([1/3,1/3,1/3])) 9 preds_e1['model_id'] = 'E1 - equal weights' 11 df_preds_ensemble = pd.concat([df_preds_end, preds_e1], axis = 0) File ~/Projetos/Mosqlimate/mosqlimate-client/mosqlient/forecast/ensemble.py:502, in Ensemble.apply_ensemble(self, weights, p) 493 quantiles = get_quantiles_log( 494 self.dist, 495 weights=weights, (...) 498 p=p, 499 ) 501 if self.mixture == "linear": --> 502 quantiles = get_quantiles_linear( 503 self.dist, weights=weights, preds=preds_, p=p 504 ) 506 df_ = pd.DataFrame([quantiles], columns=columns) 508 df_["date"] = d File ~/Projetos/Mosqlimate/mosqlimate-client/mosqlient/forecast/ensemble.py:954, in get_quantiles_linear(dist, weights, preds, p) 951 quantiles = norm.ppf(p, loc=pool[0], scale=pool[1]) 953 if dist == "log_normal": --> 954 quantiles = compute_ppf( 955 mu=preds["mu"].values, 956 sigma=preds["sigma"].values, 957 weights=weights, 958 p=p, 959 ) 961 return quantiles File ~/Projetos/Mosqlimate/mosqlimate-client/mosqlient/forecast/ensemble.py:599, in compute_ppf(mu, sigma, weights, p) 576 """ 577 Compute the Percent-Point Function (PPF), which is the inverse of the CDF, 578 for a mixture of lognormal distributions. (...) 595 The x-values corresponding to the 5th, 50th, and 95th percentiles. 596 """ 597 x = np.linspace(1e-6, 10**5, 10**5) --> 599 pdf_values = dlnorm_mix(x, mu, sigma, weights, log=False) 601 # Normalize the PDF using the trapezoidal rule 602 dx = np.diff(x) # Compute spacing between consecutive x-values File ~/Projetos/Mosqlimate/mosqlimate-client/mosqlient/forecast/ensemble.py:549, in dlnorm_mix(obs, mu, sigma, weights, log) 546 K = len(mu) # Number of components 548 if len(sigma) != K or len(weights) != K: --> 549 raise ValueError("mu, sigma, and weights must have the same length") 551 # Compute log-PDFs for each component in a vectorized manner 552 ldens = np.array( 553 [ 554 lognorm.logpdf(obs, s=sigma[i], scale=np.exp(mu[i])) 555 for i in range(K) 556 ] 557 ).T # Transpose to align with obs dimensions ValueError: mu, sigma, and weights must have the same length
Optimize the weigths in the ensemble
Rename the columns of the dataset with the cases:
data_ens =data.reset_index().rename(columns = {'data_iniSE': 'date',
'casprov':'casos'})
data_ens.head()
Optimize the weights using the log score:
weights_ls = e1.compute_weights(data_ens, metric= 'log_score')
preds_e1_ls = e1.apply_ensemble(weights_ls['weights'])
preds_e1_ls['model_id'] = 'E1 - Log Score'
df_preds_ensemble = pd.concat([df_preds_end, preds_e1_ls], axis = 0)
plot_preds(data, df_preds_ensemble)
weights_ls
Optimize the weights using the CRPS:
weights_crps = e1.compute_weights(data_ens, metric= 'crps')
preds_e1_crps = e1.apply_ensemble(weights = weights_crps['weights'])
preds_e1_crps['model_id'] = 'E1 - CRPS'
df_preds_ensemble = pd.concat([df_preds_end, preds_e1_crps], axis = 0)
plot_preds(data, df_preds_ensemble)
Compare the ensembles outputs using the linear mixture:
df_ensembles = pd.concat([preds_e1, preds_e1_ls, preds_e1_crps])
plot_preds(data, df_ensembles)
Ensembles applying the logarithmic pooling with equal weights:
e2 = Ensemble(df = df_preds_end,
order_models = [22, 30, 34],
dist = 'log_normal',
fn_loss = 'median')
preds_e2 = e2.apply_ensemble(weights = [1/3,1/3,1/3])
preds_e2['model_id'] = 'E2 - equal weights'
df_preds_ensemble = pd.concat([df_preds_end, preds_e2], axis = 0)
plot_preds(data, df_preds_ensemble)
Optimizing the weights using the log score
weights_ls = e2.compute_weights(data_ens, metric= 'log_score')
preds_e2_ls = e2.apply_ensemble(weights_ls['weights'])
preds_e2_ls['model_id'] = 'E2 - Log Score'
df_preds_ensemble = pd.concat([df_preds_end, preds_e2_ls], axis = 0)
plot_preds(data, df_preds_ensemble)
Optimize the weigths using the CRPS:
weights_crps = e2.compute_weights(data_ens, metric= 'crps')
preds_e2_crps = e2.apply_ensemble(weights_crps['weights'])
preds_e2_crps['model_id'] = 'E2 - CRPS'
df_preds_ensemble = pd.concat([df_preds_end, preds_e2_crps], axis = 0)
plot_preds(data, df_preds_ensemble)
Compare the ensembles outputs using logarithmic pooling:
df_ensembles = pd.concat([preds_e2, preds_e2_ls, preds_e2_crps])
plot_preds(data, df_ensembles)
Compare the linear mixture (E1) with the logarithmic pooling (E2)
df_ensembles = pd.concat([preds_e1, preds_e1_ls, preds_e1_crps,
preds_e2, preds_e2_ls, preds_e2_crps])
plot_preds(data, df_ensembles)
Comparing the ensemble with logarithmic pooling using normal and log normal distributions:
e2_log = Ensemble(df = df_preds_end,
order_models = [22, 30, 34],
dist = 'log_normal',
fn_loss = 'median')
e2_log.compute_weights(data_ens, metric= 'crps')
# when is not passed weigths to apply_ensemble it used the computed using the # `compute_weights` method
preds_e2_log = e2_log.apply_ensemble()
preds_e2_log['model_id'] = 'E2 - Log Normal'
e2_norm = Ensemble(df = df_preds_end,
order_models = [22, 30, 34],
dist = 'normal',
fn_loss = 'lower')
e2_norm.compute_weights(data_ens, metric= 'crps')
preds_e2_norm = e2_norm.apply_ensemble()
preds_e2_norm['model_id'] = 'E2 - normal'
df_ensembles = pd.concat([preds_e2_log, preds_e2_norm])
plot_preds(data, df_ensembles)
df_ensembles.head()