Skip to contents

Check if maximum delay specified by the user is long enough and raise potential warnings. This is achieved by computing the share of reference dates where the cumulative case count is below some aspired coverage.

Usage

check_max_delay(
  data,
  max_delay = data$max_delay,
  cum_coverage = 0.8,
  maxdelay_quantile_outlier = 0.97,
  warn = TRUE,
  warn_internal = FALSE
)

Arguments

data

Output from enw_preprocess_data().

max_delay

The maximum number of days to model in the delay distribution. Must be an integer greater than or equal to 1. Observations with delays larger then the maximum delay will be dropped. If the specified maximum delay is too short, nowcasts can be biased as important parts of the true delay distribution are cut off. At the same time, computational cost scales non-linearly with this setting, so you want the maximum delay to be as long as necessary, but not much longer. Consider what delays are realistic for your application, and when in doubt, check if increasing the maximum delay noticeably changes the delay distribution or nowcasts as estimated by epinowcast. If it does, your maximum delay may still be too short. Note that delays are zero indexed and so include the reference date and max_delay - 1 other days (i.e. a max_delay of 1 corresponds to no delay). You can use check_max_delay() to check the coverage of a delay distribution for different maximum delays.

cum_coverage

The aspired percentage of cases that the maximum delay should cover. Defaults to 0.8 (80%).

maxdelay_quantile_outlier

Only reference dates sufficiently far in the past, determined based on the maximum observed delay, are included (see details). Instead of the overall maximum observed delay, a quantile of the maximum observed delay over all reference dates is used. This is more robust against outliers. Defaults to 0.97 (97%).

warn

Should a warning be issued if the cumulative case count is below cum_coverage for the majority of reference dates?

warn_internal

Should only be TRUE if this function is called internally by another epinowcast function. Then, warnings are adjusted to avoid confusing the user.

Value

A data.table with the share of reference dates where the cumulative case count is below cum_coverage, stratified by group.

Details

The coverage is with respect to the maximum observed case count for the corresponding reference date. As the maximum observed case count is likely smaller than the true overall case count for not yet fully observed reference dates (due to right truncation), only reference dates that are more than the maximum observed delay ago are included. Still, because we can only use the maximum observed delay, not the unknown true maximum delay, the computed coverage values should be interpreted with care, as they are only proxies for the true coverage.

Examples

pobs <- enw_example(type = "preprocessed_observations")
check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)
#>    .group coverage below_coverage
#>    <char>    <num>          <num>
#> 1:      1      0.8              0
#> 2:    all      0.8              0