# Hierarchical nowcasting of age stratified COVID-19 hospitalisations in Germany

#### Sam Abbott

Source:`vignettes/germany-age-stratified-nowcasting.Rmd`

`germany-age-stratified-nowcasting.Rmd`

In this vignette we explore using `epinowcast`

to estimate
COVID-19 hospitalisations by date of positive test in Germany stratified
by age using several model specifications with different degrees of
flexibility. We then evaluate the resulting nowcasts using visual
checks, approximate leave-one-out (LOO) cross-validation using Pareto
smoothed importance sampling, and out of sample scoring using the
weighted interval score and other scoring measures for the single report
date considered here. Before working through this vignette reading the
model definition is advised
(`vignette("model-definition")`

)

## Packages

We use the `epinowcast`

package, `data.table`

and `purrr`

for data manipulation, `ggplot2`

for
plotting, `knitr`

to produce tables of output,
`loo`

to approximately evaluate out of sample performance and
`scoringutils`

to evaluate out of sample forecast
performance.

```
library(epinowcast)
library(data.table)
library(purrr)
library(ggplot2)
library(loo)
library(scoringutils)
library(knitr)
```

This vignette includes several models that take upwards of 30 minutes to fit to data on a moderately equipped laptop. To speed up model fitting if more CPUs are available set the number of threads used per chain to half the number of real cores available (here 6 as we are using 2 MCMC chains and have 12 real cores available). Note this may cause conflicts with other processes running on your computer and if this is an issue reduce the number of threads used.

```
threads <- 6
options(mc.cores = 2)
```

## Data

Nowcasting is effectively the estimation of reporting patterns for
recently reported data. This requires data on these patterns for
previous observations, which typically means the time series of data as
reported on multiple consecutive days (in theory non-consecutive days
could be used but this is not yet supported in
`epinowcast`

).

Here we use COVID-19 hospitalisations by date of positive test in Germany stratified by age group available from up to the 1st of September 2020 (with 40 days of data included prior to this) as an example of data available in real-time and hospitalisations by date of positive test available up to 20th of October to represent hospitalisations as finally reported. These data are sourced from the Robert Koch Institute via the Germany Nowcasting hub where they are deconvolved from weekly data and days with negative reported hospitalisations are adjusted.

We first filter out the data that would have been available on the 1st of September for the last 40 days.

```
nat_germany_hosp <- epinowcast::germany_covid19_hosp[location == "DE"]
retro_nat_germany <- nat_germany_hosp |>
enw_filter_report_dates(latest_date = "2021-09-01") |>
enw_filter_reference_dates(include_days = 40)
retro_nat_germany
#> reference_date location age_group confirm report_date
#> 1: 2021-07-23 DE 00+ 30 2021-07-23
#> 2: 2021-07-24 DE 00+ 31 2021-07-24
#> 3: 2021-07-25 DE 00+ 8 2021-07-25
#> 4: 2021-07-26 DE 00+ 9 2021-07-26
#> 5: 2021-07-27 DE 00+ 35 2021-07-27
#> ---
#> 6023: 2021-07-23 DE 05-14 1 2021-09-01
#> 6024: 2021-07-23 DE 15-34 21 2021-09-01
#> 6025: 2021-07-23 DE 35-59 39 2021-09-01
#> 6026: 2021-07-23 DE 60-79 21 2021-09-01
#> 6027: 2021-07-23 DE 80+ 5 2021-09-01
```

Similarly we then find the data that were available on the 20th of October for these dates, which will serve as the target “true” data.

```
latest_nat_germany <- nat_germany_hosp |>
enw_filter_report_dates(latest_date ="2021-10-20") |>
enw_latest_data() |>
enw_filter_reference_dates(latest_date = "2021-09-01", include_days = 40)
```

## Data preprocessing

`epinowcast`

works by assuming data has been preprocessed
into the reporting format it requires, coupled with meta data for both
reference and report dates. `enw_preprocess_data()`

can be
used for this, although users can also use the internal functions to
produce their own custom preprocessing steps. It is at this stage that
arbitrary groupings of observations can be defined, which will then be
propagated throughout all subsequent modelling steps. Here we have data
stratified by age, and hence grouped by age group, but in principle this
could be any grouping or combination of groups independent of the
reference and report date models. We furthermore assume a maximum delay
required to make the model identifiable. We set this to 40 days due to
evidence of long reporting delays in this example data. However, note
that in most cases the majority of right truncation occurs in the first
few days and that increasing the maximum delay has a non-linear effect
on run-time (i.e. a model with a maximum delay of 20 days will be much
faster to fit than with 40 days). Note also that under the current
formulation delays longer than the maximum are ignored so that the
adjusted estimate is really for data reported after the maximum delay
rather than for finally reported data.

Another key modelling choice we make at this stage is to model overall hospitalisations jointly with age groups rather than as an aggregation of age group estimates. This implicitly assumes that aggregated and non-aggregated data are not comparable (which may or may not be the case) but that the reporting process shares some of the same mechanisms. Another way to approach this would be to only model age stratified hospitalisations and then to aggregate the nowcast estimates into total counts after fitting the model.

```
pobs <- enw_preprocess_data(retro_nat_germany, max_delay = 40, by = "age_group")
pobs
#> obs new_confirm latest
#> 1: <data.table[6020x9]> <data.table[6020x11]> <data.table[287x10]>
#> missing_reference reporting_triangle metareference
#> 1: <data.table[0x6]> <data.table[287x42]> <data.table[287x9]>
#> metareport metadelay time snapshots by groups
#> 1: <data.table[560x12]> <data.table[40x4]> 41 287 age_group 7
#> max_delay max_date
#> 1: 40 2021-09-01
```

## Models

Here we explore a range of increasingly complex models using subject area knowledge and posterior predictive checks to motivate modelling choices.

### Shared reporting delay distribution

We first explore a relatively simple model that assumes that
reporting delays are fixed across age groups and time. As this model is
the default we simply call `epinowcast`

. As we want to make
use of `CmdStan`

’s support for within-chain parallelisation
we first compile the default model with this enabled (because of this we
also need to pass `threads_per_chain`

to
`epinowcast`

).

`multithread_model <- enw_model(threads = TRUE)`

Note that here we use two chains each using 6 threads as a demonstration but in general using 4 chains is recommended. Also note that warm-up and sampling iterations have been set below default values to reduce compute requirements but this may not be sufficient for many real world use cases. Finally, note that here we have silenced fitting progress and potential warning messages but in general this should not be done.

```
fit <- enw_fit_opts(
save_warmup = FALSE, output_loglik = TRUE, pp = TRUE,
chains = 2, threads_per_chain = threads,
iter_sampling = 500, iter_warmup = 500,
show_messages = FALSE, refresh = 0,
adapt_delta = 0.98, max_treedepth = 15
)
nowcast <- epinowcast(pobs,
fit = fit,
model = multithread_model
)
```

We first visualise the observations available to the model, the nowcast of final reported hospitalisations and the actual reported observations.

```
plot(nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Using the inflated posterior as a prior

To speed up model fitting we make use of posterior information from the previous model (with some inflation) for some parameters. Note that this is not a truly Bayesian approach and in some situations may be problematic.

```
priors <- summary(
nowcast,
type = "fit",
variables = c( "refp_mean_int", "refp_sd_int", "sqrt_phi")
)
priors[, sd := sd * 2]
priors
#> variable mean median sd mad q5
#> 1: refp_mean_int[1] 1.1207624 1.1204600 0.09072471 0.04619040 1.0462980
#> 2: refp_sd_int[1] 1.7311648 1.7296850 0.09567793 0.04728753 1.6547720
#> 3: sqrt_phi[1] 0.5955124 0.5953265 0.03710977 0.01839165 0.5659246
#> q20 q80 q95 rhat ess_bulk ess_tail
#> 1: 1.0818620 1.158388 1.1980045 0.9996327 1851.653 663.5543
#> 2: 1.6903560 1.772894 1.8116325 0.9991851 2428.449 756.0963
#> 3: 0.5804142 0.611261 0.6269922 0.9989755 1151.603 747.4970
```

### Reference day of the week effect

The underlying data we are trying to nowcast clearly has day of week
periodicity. In our default model, a group specific random walk on the
log of notified cases, this is not accounted for. Accounting for this,
using a random effect on the day of the week is likely to improve
nowcast performance and reduce the computation needed to fit the model.
We can specify this using the `enw_expectation()`

module and
the formula interface as follows.

```
expectation_module <- enw_expectation(
~ 0 + (1 | day_of_week) + (1 | day:.group), data = pobs
)
exp_nowcast <- epinowcast(pobs,
expectation = expectation_module,
fit = fit,
model = multithread_model,
priors = priors
)
```

We again visualise the nowcasts

```
plot(exp_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Posterior predictions

In order to identify areas where the current model is poorly reproducing the data we plot the posterior predictions against the data. This plot is faceted age group and reference data with the y axis showing the number of observations reported on a given day for a given reference day and the x axis showing the report date. We see fairly clearly oscillations in reported cases every 7 days which is expressed in the plot by oscillations in each facet that appear to move from left to right across facets. This indicates that some kind of week day adjustment may be needed.

```
plot(exp_nowcast, type = "posterior") +
facet_wrap(vars(age_group, reference_date), scales = "free")
```

### Reporting day of the week effect

As noted using the posterior predictions from the simple model fit
above there appears to be a day of the week effect for reported
observations. To adjust for this we introduce a random effect for day of
the week by date of report using the following helper function which
uses the metadata produced by `enw_preprocess_data()`

. Note
that `epinowcast`

uses a sparse design matrix to reduce
runtimes for some modules so the design matrix shows only unique rows
with `index`

containing the mapping to the full design
matrix.

`report_module_dow <- enw_report(~ (1 | day_of_week), data = pobs)`

We now repeat the nowcasting step with the day of the week reporting model included.

```
dow_nowcast <- epinowcast(pobs,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
model = multithread_model,
priors = priors
)
```

Nowcast performance looks visually improved but there is notable variation across age groups with the 35-59 year old nowcast appearing quite poor (and as a result the aggregate nowcast also not showing great performance). We could also plot the posterior predictions for this model in the same way as for the previous model.

```
plot(dow_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Age group variation

It is quite likely that there is some variation in the reporting delay by age and that this may be driving the variation in nowcast performance noted for the last model. Here we model this using a random effect for 5 year age group (as these were the groups supplied in the data).

`reference_module_age <- enw_reference(~ 1 + (1 | age_group), data = pobs)`

We again nowcast this time using both the age adjusted reference date model and the day of the week adjusted report date model.

```
age_nowcast <- epinowcast(pobs,
reference = reference_module_age,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
model = multithread_model,
priors = priors
)
```

Fit looks slightly better with this adjustment though uncertainty has also increased for all age groups and performance for the final day of data may have reduced compared to the first model.

```
plot(age_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Variation based on reference date

It could be the case that reporting delays change over time as well as across age groups. One way of modelling this is to assume piecewise constant variation over time modelled with a first order weekly random walk. An attractive property of this approach is that it limits the number of report date distributions that need to be evaluated in the model to the number of weeks of data and as this is an expensive computational step using this approach to introducing a time-varying parameter limits the additional computational overhead.

```
reference_module_age_week <- enw_reference(
~ 1 + (1 | age_group) + rw(week), data = pobs
)
```

As before we fit the nowcasting model,

```
week_nowcast <- epinowcast(pobs,
reference = reference_module_age_week,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
model = multithread_model,
priors = priors
)
```

In comparison to the previous model it looks like the introduction of variation over time has introduce a slight improvement in capturing hospitalisations in some age groups.

```
plot(week_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Variation based on reference date stratified by age

As a final hierarchical model it makes sense to explore whether there is evidence that reporting delays vary by week and age group jointly. In this scenario the assumption is that delays may evolve differently over time for each age group but reporting effects and measurement error are still shared across data sets.

```
reference_module_week_by_age <- enw_reference(
~ 1 + (1 | age_group) + rw(week, by = age_group), data = pobs
)
```

We can now fit this model as before.

```
age_week_nowcast <- epinowcast(pobs,
reference = reference_module_week_by_age,
report = report_module_dow,
expectation = expectation_module,
fit = fit,
model = multithread_model,
priors = priors
)
```

In comparison to the previous model it looks like the introduction of variation over time has introduce a slight improvement in capturing hospitalisations in some age groups.

```
plot(age_week_nowcast, latest_obs = latest_nat_germany) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Independent models for each age group.

The obvious question to ask at this stage is if using a model that jointly fits to all age groups at once is actually beneficial. Here we explore this by fitting the same model as previously (a day of week effect for report date and a random walk on the week of the reference date stratified by age) to each age group independently.

We could define this model with a single call to
`epinowcast`

but fitting each dataset independently but in a
joint setting would likely lead to long fit times for no real benefit.
Instead here we write a small helper function to preprocess our input
data, define report and reference date models and then run a
nowcast.

```
independent_epinowcast <- function(obs, max_delay = 40, ...) {
pobs_ind <- enw_preprocess_data(obs, max_delay = max_delay)
nowcast <- epinowcast(
data = pobs_ind,
reference = enw_reference(~ rw(week), data = pobs_ind),
report = enw_report(~ (1 | day_of_week), data = pobs_ind),
expectation = enw_expectation(
~ 0 + (1 | day_of_week) + (1 | day), data = pobs_ind
),
...
)
nowcast_summary <- summary(
nowcast,
probs = c(0.025, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.975)
)
return(nowcast_summary)
}
```

We can now use this wrapper function on the data available for each age group, summarise the resulting nowcast, and then join these into a single data.frame.

```
options(mc.cores = 2)
independent_nowcast <- map(
split(retro_nat_germany, by = "age_group"),
independent_epinowcast,
fit = enw_fit_opts(
save_warmup = FALSE, output_loglik = TRUE, pp = TRUE,
chains = 2, threads_per_chain = threads,
iter_sampling = 500, iter_warmup = 500,
show_messages = FALSE, refresh = 0,
adapt_delta = 0.95, max_treedepth = 12
),
model = multithread_model,
priors = priors
)
independent_nowcast <- rbindlist(independent_nowcast)
```

As we now have the summarised nowcasts rather than an object of class
`epinowcast`

we need to make use of the underlying plot
function ourselves. Doing so we see that performance is generally quite
good across the board though the width of credible intervals has also
increased. Importantly the 35-59 year old age group is being captured at
least as well as in the heirarchical models with only minor reductions
in performance in other age groups. This suggests that for this dataset
and nowcast date there may be relatively little benefit to jointly
modelling age groups.

```
enw_plot_nowcast_quantiles(
independent_nowcast, latest_obs = latest_nat_germany
) +
facet_wrap(vars(age_group), scales = "free_y")
```

### Alternative models

In all the models defined above we have assumed that the delay
distribution, aside from report date effects, is parametric and has a
lognormal distribution. Both of these assumptions may be less than
optimal. Alternatives include assuming a different distributional form
(such as the gamma distribution which is also supported by
`epinowcast`

) or assuming that the report delay is fully
non-parametric which is not yet supported but will be in future package
versions.

There are any number of additional models we could explore within the
framework supported by `epinowcast`

as well as a large number
of alternative parameterisations that are not yet supported. For
example, we could explore models with more complex reporting day
effects, including holidays (supported in `epinowcast`

either
as a separate effect or by assuming they have the same reporting hazard
as Sundays) and variation over time which would represent reporting
delays changing independently of reference date (this would be similar
to the time varying model we defined above but with this effect
occurring in the report date model rather than in the reference date
model). These choices are data dependent and domain knowledge needs to
be used to assess the likely mechanisms.

If interested in expanding the functionality of the underlying model
to address some of these issues note that `epinowcast`

allows
users to pass in their own models meaning that alternative
parameterisations may be easily tested within the package
infrastructure. Once this testing has been done alterations that
increase the flexibility of the package model and improves its defaults
are very welcome as pull requests.

## Evaluation

As we have only nowcast a single date, and we visualised performance as we went, this evaluation is anything but complete or rigorous but we can give some examples of how we might evaluate performance more generally and potentially draw some useful initial conclusions.

We first list all models (including the simplest case) and give them informative names,

```
nowcasts <- list(
"Reference: Fixed, Report: Fixed" = exp_nowcast,
"Reference: Fixed, Report: Day of week" = dow_nowcast,
"Reference: Age, Report: Day of week" = age_nowcast,
"Reference: Age and week, Report: Day of week" = week_nowcast,
"Reference: Age and week by age, Report: Day of week" = age_week_nowcast
)
```

and then summarise the nowcast posterior for each model and join into
a tidy `data.frame`

to make further analysis easier.

```
summarised_nowcasts <- map(
nowcasts, summary,
probs = c(0.025, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.975)
)
summarised_nowcasts$`Independent by age, Reference: Week, Report: Day of week` <- independent_nowcast # nolint
summarised_nowcasts <- rbindlist(summarised_nowcasts, idcol = "model",
use.names = TRUE)
summarised_nowcasts[, `:=`(
model = factor(
model,
levels = c("Reference: Fixed, Report: Fixed",
"Reference: Fixed, Report: Day of week",
"Reference: Age, Report: Day of week",
"Reference: Age and week, Report: Day of week",
"Reference: Age and week by age, Report: Day of week",
"Independent by age, Reference: Week, Report: Day of week")),
age_group = factor(
age_group,
levels = c("00+", "00-04", "05-14", "15-34", "35-59", "60-79", "80+"))
)]
```

This allows us to plot nowcasts for each model and age group compared to the latest data. Looking at the plot shows some small differences across models with uncertainty generally decreasing as model complexity increases. Some age groups are clearly better nowcast than others with the 35-59 year old age group in particular having poor nowcast coverage.

```
enw_plot_nowcast_quantiles(
summarised_nowcasts, latest_obs = latest_nat_germany
) +
facet_grid(vars(age_group), vars(model), scales = "free_y")
```

As a crude measure of general out of sample performance we can use
the leave one out information criterion as supplied by the
`loo`

package though note this is not typically appropriate
for time series data (where
approximate LFO cross validation is likely to perform better), the
approximation used here to avoid refitting is likely to be poor, and we
are not accounting for this by refitting the model as required.

```
loos <- map(nowcasts, ~ .$fit[[1]]$loo())
loo_compare(loos)
#> elpd_diff se_diff
#> Reference: Age and week, Report: Day of week 0.0 0.0
#> Reference: Age, Report: Day of week -11.1 5.2
#> Reference: Age and week by age, Report: Day of week -11.7 3.3
#> Reference: Fixed, Report: Day of week -50.7 10.5
#> Reference: Fixed, Report: Fixed -611.9 45.2
```

We see that the most model which includes day of the week effects for the date of report substantially outperformed the baseline model with no adjustment and that the more complex models that adjusted for variation by age and week for the date of test improved estimated out of sample performance but uncertainty around these estimates was wide.

More rigorously, we can evaluate the nowcasts using proper scoring
rules from the `scoringutils`

package including the weighted
interval score. Here we limit the nowcasts scored to the last 7 days of
data to make interpretation easier, transform nowcasts into format
required for `scoringutils`

, link with the latest available
data, and the finally call `scoringutils::eval_forecasts()`

.
Note that as we are only scoring a single nowcasts it is difficult to
generalise our findings as this one day of reporting may have been
unusual. To have a more informed view of which model to pick we would
ideally nowcast a range of dates and evaluate each of them.

As a first step we score overall performance. Here we see that the baseline model with no variation actually performs very well with only models that include at least day of the week, age groups and variation by week performing comparably. Other performance characteristics are relatively similar across models (with all models being biased towards underprediction for example).

```
score <- enw_score_nowcast(
summarised_nowcasts,
latest_nat_germany[reference_date > (max(reference_date) - 7)]
)
score |>
summarise_scores(by = "model") |>
kable()
```

model | interval_score | dispersion | underprediction | overprediction | coverage_deviation | bias | ae_median |
---|---|---|---|---|---|---|---|

Reference: Fixed, Report: Fixed | 9.786248 | 3.158477 | 4.225981 | 2.4019466 | -0.1087912 | -0.1020408 | 16.29592 |

Reference: Fixed, Report: Day of week | 10.100157 | 2.326279 | 6.827567 | 0.9466248 | -0.1386185 | -0.2428571 | 15.48980 |

Reference: Age, Report: Day of week | 8.630703 | 2.267425 | 5.730863 | 0.6326531 | -0.1417582 | -0.3673469 | 13.91837 |

Reference: Age and week, Report: Day of week | 7.314954 | 2.790201 | 1.396860 | 3.1275510 | -0.0193093 | 0.0826531 | 12.02041 |

Reference: Age and week by age, Report: Day of week | 6.887651 | 3.004575 | 2.028258 | 1.8546939 | 0.0058085 | -0.1387755 | 11.29592 |

Independent by age, Reference: Week, Report: Day of week | 7.292314 | 3.869724 | 1.583673 | 1.8387755 | 0.0340659 | -0.0622449 | 12.31633 |

Finally we look across all scores relative to the simple model with no variation. This nicely captures the role of the last data point on performance but also highlights more variation across reference dates and age groups between models. The difference in performance between the hierarchical by age models and the model that treats age groups independently is also very clear.

```
age_date_score <- score |>
summarise_scores(by = c("model", "reference_date", "age_group"))
fixed_score <- age_date_score[
model %in% "Reference: Fixed, Report: Fixed",
.(reference_date, age_group, fixed_is = interval_score)
]
age_date_score <- merge(
age_date_score, fixed_score, by = c("reference_date", "age_group")
)
age_date_score <- age_date_score[, interval_score := interval_score / fixed_is]
age_date_score <- age_date_score[!model %in% "Reference: Fixed, Report: Fixed"]
plot <- ggplot(age_date_score) +
aes(x = reference_date, y = interval_score, col = model) +
geom_hline(yintercept = 1, linetype = 2, linewidth = 1.2, alpha = 0.5) +
geom_line(linewidth = 1.1, alpha = 0.6) +
geom_point(size = 1.2) +
facet_wrap(vars(age_group)) +
scale_color_brewer(palette = "Dark2") +
scale_y_log10(labels = scales::percent)
plot <- enw_plot_theme(plot) +
labs(x = "Reference date",
y = "Weighted interval score (relative to Reference: Fixed, Report: Fixed model)") + # nolint
guides(col = guide_legend(title = "Model", ncol = 2))
plot
```

## Summary

In this vignette we showcased using `epinowcast`

to
nowcast age stratified COVID-19 hospitalisations in Germany by date of
test with a series of increasingly complex models motivated by the data.
We also showed some simple methods for exploring these nowcasts and
evaluating them.

Using the limited information available to us (as we have only nowcast a single date and used performance for this date to motivate new models) it appears that all models performed acceptably and that, aside from the last data point, models with age and day of the week effects likely performed better. It is also fairly clear that performance degrades as the amount of reported data is reduced which intuitively makes sense with performance being particularly sensitive the first day reported data is available (i.e “now”). Our apparent finding that delays evolve fairly independently across age groups motivates choosing a model that is very flexible to this, at least for the date of reference model.

Despite the independent model and the fixed effect model both doing well overall, for most applications if choosing a model based on this evaluation, I would likely select a relatively flexible model (with day of the week, age group, and age stratified weekly variation) relying on the hierarchical structure to limit overfitting, and excepting a small reduction in performance in some edge cases (with the hope that these are edge cases and not a common feature of the data). However in practice, I would want to explore nowcasting and evaluating more dates and if possible a greater range of model structures (as discussed in the alternative modelling section of this vignette). Note that with the proper scoring approach taken here to understand model performance (and commonly used in the literature) we are ranking models based on absolute errors and so groups with high counts (such as the 35-59 age group) are more important to nowcast correctly than groups with smaller counts (such as those aged 80+).