In this vignette we explore using epinowcast to estimate COVID-19 hospitalisations by date of positive test in Germany stratified by age using several model specifications with different degrees of flexibility. We then evaluate the resulting nowcasts using visual checks, approximate leave-one-out (LOO) cross-validation using Pareto smoothed importance sampling, and out of sample scoring using the weighted interval score and other scoring measures for the single report date considered here. Before working through this vignette reading the model definition is advised (vignette("model-definition"))

## Packages

We use the epinowcast package, data.table and purrr for data manipulation, ggplot2 for plotting, knitr to produce tables of output, loo to approximately evaluate out of sample performance and scoringutils to evaluate out of sample forecast performance.

library(epinowcast)
library(data.table)
library(purrr)
library(ggplot2)
library(loo)
library(scoringutils)
library(knitr)

This vignette includes several models that take upwards of 30 minutes to fit to data on a moderately equipped laptop. To speed up model fitting if more CPUs are available set the number of threads used per chain to half the number of real cores available (here 6 as we are using 2 MCMC chains and have 12 real cores available). Note this may cause conflicts with other processes running on your computer and if this is an issue reduce the number of threads used.

threads <- 6
options(mc.cores = 2)

## Data

Nowcasting is effectively the estimation of reporting patterns for recently reported data. This requires data on these patterns for previous observations, which typically means the time series of data as reported on multiple consecutive days (in theory non-consecutive days could be used but this is not yet supported in epinowcast).

Here we use COVID-19 hospitalisations by date of positive test in Germany stratified by age group available from up to the 1st of September 2020 (with 40 days of data included prior to this) as an example of data available in real-time and hospitalisations by date of positive test available up to 20th of October to represent hospitalisations as finally reported. These data are sourced from the Robert Koch Institute via the Germany Nowcasting hub where they are deconvolved from weekly data and days with negative reported hospitalisations are adjusted.

We first filter out the data that would have been available on the 1st of September for the last 40 days.

nat_germany_hosp <- epinowcast::germany_covid19_hosp[location == "DE"]

retro_nat_germany <- nat_germany_hosp |>
enw_filter_report_dates(latest_date = "2021-09-01") |>
enw_filter_reference_dates(include_days = 40)
retro_nat_germany
#>       reference_date location age_group confirm report_date
#>               <IDat>   <fctr>    <fctr>   <int>      <IDat>
#>    1:     2021-07-23       DE       00+      30  2021-07-23
#>    2:     2021-07-24       DE       00+      31  2021-07-24
#>    3:     2021-07-25       DE       00+       8  2021-07-25
#>    4:     2021-07-26       DE       00+       9  2021-07-26
#>    5:     2021-07-27       DE       00+      35  2021-07-27
#>   ---
#> 6023:     2021-07-23       DE     05-14       1  2021-09-01
#> 6024:     2021-07-23       DE     15-34      21  2021-09-01
#> 6025:     2021-07-23       DE     35-59      39  2021-09-01
#> 6026:     2021-07-23       DE     60-79      21  2021-09-01
#> 6027:     2021-07-23       DE       80+       5  2021-09-01

Similarly we then find the data that were available on the 20th of October for these dates, which will serve as the target “true” data.

latest_nat_germany <- nat_germany_hosp |>
enw_filter_report_dates(latest_date ="2021-10-20") |>
enw_latest_data() |>
enw_filter_reference_dates(latest_date = "2021-09-01", include_days = 40)

## Data preprocessing

epinowcast works by assuming data has been preprocessed into the reporting format it requires, coupled with meta data for both reference and report dates. enw_preprocess_data() can be used for this, although users can also use the internal functions to produce their own custom preprocessing steps. It is at this stage that arbitrary groupings of observations can be defined, which will then be propagated throughout all subsequent modelling steps. Here we have data stratified by age, and hence grouped by age group, but in principle this could be any grouping or combination of groups independent of the reference and report date models. We furthermore assume a maximum delay required to make the model identifiable. We set this to 40 days due to evidence of long reporting delays in this example data. However, note that in most cases the majority of right truncation occurs in the first few days and that increasing the maximum delay has a non-linear effect on run-time (i.e. a model with a maximum delay of 20 days will be much faster to fit than with 40 days). Note also that under the current formulation delays longer than the maximum are ignored so that the adjusted estimate is really for data reported after the maximum delay rather than for finally reported data.

Another key modelling choice we make at this stage is to model overall hospitalisations jointly with age groups rather than as an aggregation of age group estimates. This implicitly assumes that aggregated and non-aggregated data are not comparable (which may or may not be the case) but that the reporting process shares some of the same mechanisms. Another way to approach this would be to only model age stratified hospitalisations and then to aggregate the nowcast estimates into total counts after fitting the model.

pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 40, by = "age_group")
pobs
#>                     obs           new_confirm               latest
#>                  <list>                <list>               <list>
#> 1: <data.table[6020x9]> <data.table[6020x11]> <data.table[287x10]>
#>    missing_reference   reporting_triangle       metareference
#>               <list>               <list>              <list>
#> 1: <data.table[0x6]> <data.table[287x42]> <data.table[287x9]>
#>              metareport          metadelay  time snapshots        by groups
#>                  <list>             <list> <int>     <int>    <list>  <int>
#> 1: <data.table[560x12]> <data.table[40x4]>    41       287 age_group      7
#>    max_delay   max_date
#>        <num>     <IDat>
#> 1:        40 2021-09-01

## Models

Here we explore a range of increasingly complex models using subject area knowledge and posterior predictive checks to motivate modelling choices.

### Shared reporting delay distribution

We first explore a relatively simple model that assumes that reporting delays are fixed across age groups and time. As this model is the default we simply call epinowcast. As we want to make use of CmdStan’s support for within-chain parallelisation we first compile the default model with this enabled (because of this we also need to pass threads_per_chain to epinowcast).

multithread_model <- enw_model(threads = TRUE)

Note that here we use two chains each using 6 threads as a demonstration but in general using 4 chains is recommended. Also note that warm-up and sampling iterations have been set below default values to reduce compute requirements but this may not be sufficient for many real world use cases. Finally, note that here we have silenced fitting progress and potential warning messages but in general this should not be done.

fit <- enw_fit_opts(
save_warmup = FALSE, output_loglik = TRUE, pp = TRUE,
iter_sampling = 500, iter_warmup = 500,
show_messages = FALSE, refresh = 0,
adapt_delta = 0.98, max_treedepth = 15
)
nowcast <- epinowcast(pobs,
fit = fit,
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 185.8 seconds.
#> Chain 2 finished in 254.7 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 220.3 seconds.
#> Total execution time: 254.9 seconds.

We first visualise the observations available to the model, the nowcast of final reported hospitalisations and the actual reported observations.

plot(nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Using the inflated posterior as a prior

To speed up model fitting we make use of posterior information from the previous model (with some inflation) for some parameters. Note that this is not a truly Bayesian approach and in some situations may be problematic.

priors <- summary(
nowcast,
type = "fit",
variables = c( "refp_mean_int", "refp_sd_int", "sqrt_phi")
)
priors[, sd := sd * 2]
priors
#>            variable      mean    median         sd        mad        q5       q20       q80       q95     rhat ess_bulk ess_tail
#>              <char>     <num>     <num>      <num>      <num>     <num>     <num>     <num>     <num>    <num>    <num>    <num>
#> 1: refp_mean_int[1] 1.1216000 1.1195750 0.08808307 0.04474487 1.0478135 1.0855260 1.1585820 1.1935425 1.004192 1410.998 795.7473
#> 2:   refp_sd_int[1] 1.7296920 1.7269600 0.09428034 0.04573080 1.6514915 1.6905040 1.7705580 1.8108135 1.008058 2130.656 879.3806
#> 3:      sqrt_phi[1] 0.5947632 0.5951165 0.03426963 0.01748208 0.5655864 0.5798324 0.6092524 0.6205295 1.000538 1246.998 795.8234

### Reference day of the week effect

The underlying data we are trying to nowcast clearly has day of week periodicity. In our default model, a group specific random walk on the log of notified cases, this is not accounted for. Accounting for this, using a random effect on the day of the week is likely to improve nowcast performance and reduce the computation needed to fit the model. We can specify this using the enw_expectation() module and the formula interface as follows.

expectation_module <- enw_expectation(
~ 0 + (1 | day_of_week) + (1 | day:.group), data = pobs
)

exp_nowcast <- epinowcast(pobs,
expectation = expectation_module,
fit = fit,
priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 327.8 seconds.
#> Chain 2 finished in 693.7 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 510.7 seconds.
#> Total execution time: 693.9 seconds.

We again visualise the nowcasts

plot(exp_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Posterior predictions

In order to identify areas where the current model is poorly reproducing the data we plot the posterior predictions against the data. This plot is faceted age group and reference data with the y axis showing the number of observations reported on a given day for a given reference day and the x axis showing the report date. We see fairly clearly oscillations in reported cases every 7 days which is expressed in the plot by oscillations in each facet that appear to move from left to right across facets. This indicates that some kind of week day adjustment may be needed.

plot(exp_nowcast, type = "posterior") +
facet_wrap(vars(age_group, reference_date), scales = "free")

### Reporting day of the week effect

As noted using the posterior predictions from the simple model fit above there appears to be a day of the week effect for reported observations. To adjust for this we introduce a random effect for day of the week by date of report using the following helper function which uses the metadata produced by enw_preprocess_data(). Note that epinowcast uses a sparse design matrix to reduce runtimes for some modules so the design matrix shows only unique rows with index containing the mapping to the full design matrix.

report_module_dow <- enw_report(~ (1 | day_of_week), data = pobs)

We now repeat the nowcasting step with the day of the week reporting model included.

dow_nowcast <- epinowcast(pobs,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 2 finished in 690.6 seconds.
#> Chain 1 finished in 774.6 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 732.6 seconds.
#> Total execution time: 774.8 seconds.

Nowcast performance looks visually improved but there is notable variation across age groups with the 35-59 year old nowcast appearing quite poor (and as a result the aggregate nowcast also not showing great performance). We could also plot the posterior predictions for this model in the same way as for the previous model.

plot(dow_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Age group variation

It is quite likely that there is some variation in the reporting delay by age and that this may be driving the variation in nowcast performance noted for the last model. Here we model this using a random effect for 5 year age group (as these were the groups supplied in the data).

reference_module_age <- enw_reference(~ 1 + (1 | age_group), data = pobs)

We again nowcast this time using both the age adjusted reference date model and the day of the week adjusted report date model.

age_nowcast <- epinowcast(pobs,
reference = reference_module_age,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 851.4 seconds.
#> Chain 2 finished in 966.5 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 909.0 seconds.
#> Total execution time: 966.7 seconds.

Fit looks slightly better with this adjustment though uncertainty has also increased for all age groups and performance for the final day of data may have reduced compared to the first model.

plot(age_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Variation based on reference date

It could be the case that reporting delays change over time as well as across age groups. One way of modelling this is to assume piecewise constant variation over time modelled with a first order weekly random walk. An attractive property of this approach is that it limits the number of report date distributions that need to be evaluated in the model to the number of weeks of data and as this is an expensive computational step using this approach to introducing a time-varying parameter limits the additional computational overhead.

reference_module_age_week <- enw_reference(
~ 1 + (1 | age_group) + rw(week), data = pobs
)

As before we fit the nowcasting model,

week_nowcast <- epinowcast(pobs,
reference = reference_module_age_week,
report = report_module_dow,
expectation = expectation_module,
fit = fit,

priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 1865.2 seconds.
#> Chain 2 finished in 1903.2 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 1884.2 seconds.
#> Total execution time: 1903.4 seconds.

In comparison to the previous model it looks like the introduction of variation over time has introduce a slight improvement in capturing hospitalisations in some age groups.

plot(week_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Variation based on reference date stratified by age

As a final hierarchical model it makes sense to explore whether there is evidence that reporting delays vary by week and age group jointly. In this scenario the assumption is that delays may evolve differently over time for each age group but reporting effects and measurement error are still shared across data sets.

reference_module_week_by_age <- enw_reference(
~ 1 + (1 | age_group) + rw(week, by = age_group), data = pobs
)

We can now fit this model as before.

age_week_nowcast <- epinowcast(pobs,
reference = reference_module_week_by_age,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 1936.7 seconds.
#> Chain 2 finished in 2058.0 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 1997.3 seconds.
#> Total execution time: 2058.3 seconds.

In comparison to the previous model it looks like the introduction of variation over time has introduce a slight improvement in capturing hospitalisations in some age groups.

plot(age_week_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")

### Independent models for each age group.

The obvious question to ask at this stage is if using a model that jointly fits to all age groups at once is actually beneficial. Here we explore this by fitting the same model as previously (a day of week effect for report date and a random walk on the week of the reference date stratified by age) to each age group independently.

We could define this model with a single call to epinowcast but fitting each dataset independently but in a joint setting would likely lead to long fit times for no real benefit. Instead here we write a small helper function to preprocess our input data, define report and reference date models and then run a nowcast.

independent_epinowcast <- function(obs, max_delay = 40, ...) {
pobs_ind <- enw_preprocess_data(obs, max_delay = max_delay)

nowcast <- epinowcast(
data = pobs_ind,
reference = enw_reference(~ rw(week), data = pobs_ind),
report = enw_report(~ (1 | day_of_week), data = pobs_ind),
expectation = enw_expectation(
~ 0 + (1 | day_of_week) + (1 | day), data = pobs_ind
),
...
)

nowcast_summary <- summary(
nowcast,
probs = c(0.025, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.975)
)
return(nowcast_summary)
}

We can now use this wrapper function on the data available for each age group, summarise the resulting nowcast, and then join these into a single data frame.

options(mc.cores = 2)

independent_nowcast <- map(
split(retro_nat_germany, by = "age_group"),
independent_epinowcast,
fit = enw_fit_opts(
save_warmup = FALSE, output_loglik = TRUE, pp = TRUE,
iter_sampling = 500, iter_warmup = 500,
show_messages = FALSE, refresh = 0,
adapt_delta = 0.95, max_treedepth = 12
),
priors = priors
)
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 142.0 seconds.
#> Chain 2 finished in 160.4 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 151.2 seconds.
#> Total execution time: 160.7 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 78.9 seconds.
#> Chain 2 finished in 93.3 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 86.1 seconds.
#> Total execution time: 93.6 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 69.4 seconds.
#> Chain 2 finished in 82.3 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 75.9 seconds.
#> Total execution time: 82.6 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 91.8 seconds.
#> Chain 2 finished in 102.7 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 97.2 seconds.
#> Total execution time: 102.9 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 2 finished in 137.2 seconds.
#> Chain 1 finished in 142.7 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 140.0 seconds.
#> Total execution time: 142.8 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 2 finished in 91.3 seconds.
#> Chain 1 finished in 99.8 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 95.5 seconds.
#> Total execution time: 100.0 seconds.
#>
#> Running MCMC with 2 parallel chains, with 6 thread(s) per chain...
#>
#> Chain 1 finished in 89.0 seconds.
#> Chain 2 finished in 99.6 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 94.3 seconds.
#> Total execution time: 99.8 seconds.
independent_nowcast <- rbindlist(independent_nowcast)

As we now have the summarised nowcasts rather than an object of class epinowcast we need to make use of the underlying plot function ourselves. Doing so we see that performance is generally quite good across the board though the width of credible intervals has also increased. Importantly the 35-59 year old age group is being captured at least as well as in the heirarchical models with only minor reductions in performance in other age groups. This suggests that for this dataset and nowcast date there may be relatively little benefit to jointly modelling age groups.

enw_plot_nowcast_quantiles(
independent_nowcast, latest_obs = latest_nat_germany
) +
facet_wrap(vars(age_group), scales = "free_y")

### Alternative models

In all the models defined above we have assumed that the delay distribution, aside from report day effects, is parametric and has a lognormal distribution. Both of these assumptions may be less than optimal. Alternatives include assuming a different distributional form (such as the gamma distribution which is also supported by epinowcast) or assuming that the report delay is fully non-parametric which is not yet supported but will be in future package versions.

There are any number of additional models we could explore within the framework supported by epinowcast as well as a large number of alternative parameterisations that are not yet supported. For example, we could explore models with more complex reporting day effects, including holidays (supported in epinowcast either as a separate effect or by assuming they have the same reporting hazard as Sundays) and variation over time which would represent reporting delays changing independently of reference date (this would be similar to the time varying model we defined above but with this effect occurring in the report date model rather than in the reference date model). These choices are data dependent and domain knowledge needs to be used to assess the likely mechanisms.

If interested in expanding the functionality of the underlying model to address some of these issues note that epinowcast allows users to pass in their own models meaning that alternative parameterisations may be easily tested within the package infrastructure. Once this testing has been done alterations that increase the flexibility of the package model and improves its defaults are very welcome as pull requests.

## Evaluation

As we have only nowcast a single date, and we visualised performance as we went, this evaluation is anything but complete or rigorous but we can give some examples of how we might evaluate performance more generally and potentially draw some useful initial conclusions.

We first list all models (including the simplest case) and give them informative names,

nowcasts <- list(
"Reference: Fixed, Report: Fixed" = exp_nowcast,
"Reference: Fixed, Report: Day of week" = dow_nowcast,
"Reference: Age, Report: Day of week" = age_nowcast,
"Reference: Age and week, Report: Day of week" = week_nowcast,
"Reference: Age and week by age, Report: Day of week" = age_week_nowcast
)

and then summarise the nowcast posterior for each model and join into a tidy data frame to make further analysis easier.

summarised_nowcasts <- map(
nowcasts, summary,
probs = c(0.025, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.975)
)
summarised_nowcasts$Independent by age, Reference: Week, Report: Day of week <- independent_nowcast # nolint summarised_nowcasts <- rbindlist(summarised_nowcasts, idcol = "model", use.names = TRUE) summarised_nowcasts[, :=( model = factor( model, levels = c("Reference: Fixed, Report: Fixed", "Reference: Fixed, Report: Day of week", "Reference: Age, Report: Day of week", "Reference: Age and week, Report: Day of week", "Reference: Age and week by age, Report: Day of week", "Independent by age, Reference: Week, Report: Day of week")), age_group = factor( age_group, levels = c("00+", "00-04", "05-14", "15-34", "35-59", "60-79", "80+")) )] This allows us to plot nowcasts for each model and age group compared to the latest data. Looking at the plot shows some small differences across models with uncertainty generally decreasing as model complexity increases. Some age groups are clearly better nowcast than others with the 35-59 year old age group in particular having poor nowcast coverage. enw_plot_nowcast_quantiles( summarised_nowcasts, latest_obs = latest_nat_germany ) + facet_grid(vars(age_group), vars(model), scales = "free_y") As a crude measure of general out of sample performance we can use the leave one out information criterion as supplied by the loo package though note this is not typically appropriate for time series data (where approximate LFO cross validation is likely to perform better), the approximation used here to avoid refitting is likely to be poor, and we are not accounting for this by refitting the model as required. loos <- map(nowcasts, ~ .$fit[[1]]\$loo())
loo_compare(loos)
#>                                                     elpd_diff se_diff
#> Reference: Age and week, Report: Day of week           0.0       0.0
#> Reference: Age and week by age, Report: Day of week  -10.3       3.7
#> Reference: Age, Report: Day of week                  -11.0       5.5
#> Reference: Fixed, Report: Day of week                -48.6      10.2
#> Reference: Fixed, Report: Fixed                     -609.5      45.1

We see that the most model which includes day of the week effects for the date of report substantially outperformed the baseline model with no adjustment and that the more complex models that adjusted for variation by age and week for the date of test improved estimated out of sample performance but uncertainty around these estimates was wide.

More rigorously, we can evaluate the nowcasts using proper scoring rules from the scoringutils package including the weighted interval score. Here we limit the nowcasts scored to the last 7 days of data to make interpretation easier, transform nowcasts into format required for scoringutils, link with the latest available data, and the finally call scoringutils::eval_forecasts(). Note that as we are only scoring a single nowcasts it is difficult to generalise our findings as this one day of reporting may have been unusual. To have a more informed view of which model to pick we would ideally nowcast a range of dates and evaluate each of them.

As a first step we score overall performance. Here we see that the baseline model with no variation actually performs very well with only models that include at least day of the week, age groups and variation by week performing comparably. Other performance characteristics are relatively similar across models (with all models being biased towards underprediction for example).

score <- enw_score_nowcast(
summarised_nowcasts,
latest_nat_germany[reference_date > (max(reference_date) - 7)]
)

score |>
summarise_scores(by = "model") |>
kable()
model interval_score dispersion underprediction overprediction coverage_deviation bias ae_median
Reference: Fixed, Report: Fixed 9.818798 3.139237 4.339090 2.3401256 -0.0930926 -0.0887755 16.24490
Reference: Fixed, Report: Day of week 10.083115 2.355764 6.779435 0.9483516 -0.1323391 -0.2265306 15.54082
Reference: Age, Report: Day of week 8.562323 2.317488 5.592465 0.6524333 -0.1354788 -0.3581633 13.86735
Reference: Age and week, Report: Day of week 7.569366 2.899246 1.156672 3.5136578 -0.0255887 0.1285714 12.42857
Reference: Age and week by age, Report: Day of week 6.797542 3.034496 1.942857 1.8197802 0.0215071 -0.1357143 11.28571
Independent by age, Reference: Week, Report: Day of week 7.277953 3.854562 1.538305 1.8845212 0.0419152 -0.0591837 12.26531

Finally we look across all scores relative to the simple model with no variation. This nicely captures the role of the last data point on performance but also highlights more variation across reference dates and age groups between models. The difference in performance between the hierarchical by age models and the model that treats age groups independently is also very clear.

age_date_score <- score |>
summarise_scores(by = c("model", "reference_date", "age_group"))
fixed_score <- age_date_score[
model %in% "Reference: Fixed, Report: Fixed",
.(reference_date, age_group, fixed_is = interval_score)
]
age_date_score <- merge(
age_date_score, fixed_score, by = c("reference_date", "age_group")
)

age_date_score <- age_date_score[, interval_score := interval_score / fixed_is]
age_date_score <- age_date_score[!model %in% "Reference: Fixed, Report: Fixed"]
plot <- ggplot(age_date_score) +
aes(x = reference_date, y = interval_score, col = model) +
geom_hline(yintercept = 1, linetype = 2, size = 1.2, alpha = 0.5) +
geom_line(size = 1.1, alpha = 0.6) +
geom_point(size = 1.2) +
facet_wrap(vars(age_group)) +
scale_color_brewer(palette = "Dark2") +
scale_y_log10(labels = scales::percent)

plot <- enw_plot_theme(plot) +
labs(x = "Reference date",
y = "Weighted interval score (relative to Reference: Fixed, Report: Fixed model)") + # nolint
guides(col = guide_legend(title = "Model", ncol = 2))
plot

## Summary

In this vignette we showcased using epinowcast to nowcast age stratified COVID-19 hospitalisations in Germany by date of test with a series of increasingly complex models motivated by the data. We also showed some simple methods for exploring these nowcasts and evaluating them.

Using the limited information available to us (as we have only nowcast a single date and used performance for this date to motivate new models) it appears that all models performed acceptably and that, aside from the last data point, models with age and day of the week effects likely performed better. It is also fairly clear that performance degrades as the amount of reported data is reduced which intuitively makes sense with performance being particularly sensitive the first day reported data is available (i.e “now”). Our apparent finding that delays evolve fairly independently across age groups motivates choosing a model that is very flexible to this, at least for the date of reference model.

Despite the independent model and the fixed effect model both doing well overall, for most applications if choosing a model based on this evaluation, I would likely select a relatively flexible model (with day of the week, age group, and age stratified weekly variation) relying on the hierarchical structure to limit overfitting, and excepting a small reduction in performance in some edge cases (with the hope that these are edge cases and not a common feature of the data). However in practice, I would want to explore nowcasting and evaluating more dates and if possible a greater range of model structures (as discussed in the alternative modelling section of this vignette). Note that with the proper scoring approach taken here to understand model performance (and commonly used in the literature) we are ranking models based on absolute errors and so groups with high counts (such as the 35-59 age group) are more important to nowcast correctly than groups with smaller counts (such as those aged 80+).