Check if maximum delay specified by the user is long enough and raise potential warnings. This is achieved by computing the share of reference dates where the cumulative case count is below some aspired coverage.
Usage
check_max_delay(
data,
max_delay = data$max_delay,
cum_coverage = 0.8,
maxdelay_quantile_outlier = 0.97,
warn = TRUE,
warn_internal = FALSE
)
Arguments
- data
Output from
enw_preprocess_data()
.- max_delay
The maximum delay to model in the delay distribution, specified in units of the timestep (e.g., if
timestep = "week"
, thenmax_delay = 3
means 3 weeks). If not specified the maximum observed delay is assumed to be the true maximum delay in the model. Otherwise, an integer greater than or equal to 1 can be specified. Observations with delays larger than the maximum delay will be dropped. If the specified maximum delay is too short, nowcasts can be biased as important parts of the true delay distribution are cut off. At the same time, computational cost scales non-linearly with this setting, so you want the maximum delay to be as long as necessary, but not much longer.Steps to take to determine the maximum delay:
Consider what is realistic and relevant for your application.
Check the proportion of observations reported (
prop_reported
) by delay in thenew_confirm
output ofenw_preprocess_obs
.Use
check_max_delay()
to check the coverage of a candidatemax_delay
.If in doubt, check if increasing the maximum delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast
. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
max_delay - 1
other intervals (i.e. amax_delay
of 1 corresponds to no delay).- cum_coverage
The aspired percentage of cases that the maximum delay should cover. Defaults to 0.8 (80%).
- maxdelay_quantile_outlier
Only reference dates sufficiently far in the past, determined based on the maximum observed delay, are included (see details). Instead of the overall maximum observed delay, a quantile of the maximum observed delay over all reference dates is used. This is more robust against outliers. Defaults to 0.97 (97%).
- warn
Should a warning be issued if the cumulative case count is below
cum_coverage
for the majority of reference dates?- warn_internal
Should only be
TRUE
if this function is called internally by anotherepinowcast
function. Then, warnings are adjusted to avoid confusing the user.
Value
A data.table
with the share of reference dates where the
cumulative case count is below cum_coverage
, stratified by group.
Details
When data is very sparse (e.g., predominantly zero counts), the
function may not be able to compute meaningful coverage statistics.
In such cases, a warning is issued and the function treats the data as
having no coverage issues.
This typically occurs when groups have very few non-zero observations or
when the specified max_delay
is too large relative to available
data.
The coverage is with respect to the maximum observed case count for the corresponding reference date. As the maximum observed case count is likely smaller than the true overall case count for not yet fully observed reference dates (due to right truncation), only reference dates that are more than the maximum observed delay ago are included. Still, because we can only use the maximum observed delay, not the unknown true maximum delay, the computed coverage values should be interpreted with care, as they are only proxies for the true coverage.
See also
Functions used for checking inputs
check_design_matrix_sparsity()
,
check_group()
,
check_group_date_unique()
,
check_module()
,
check_modules_compatible()
,
check_numeric_timestep()
,
check_observation_indicator()
,
check_quantiles()
,
check_timestep()
,
check_timestep_by_date()
,
check_timestep_by_group()
Examples
pobs <- enw_example(type = "preprocessed_observations")
check_max_delay(pobs, max_delay = 20, cum_coverage = 0.8)
#> .group coverage below_coverage
#> <char> <num> <num>
#> 1: 1 0.8 0
#> 2: all 0.8 0