Skip to contents

Ensures that all reference and report dates are present for all groups based on the maximum and minimum dates found in the data. This function may be of use to users when preprocessing their data. In general all features that you may consider using as grouping variables or as covariates need to be included in the by variable.

Usage

enw_complete_dates(obs, by = c(), max_delay, missing_reference = TRUE)

Arguments

obs

A data frame containing at least the following variables: reference date (index date of interest), report_date (report date for observations), confirm (cumulative observations by reference and report date).

by

A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling

max_delay

Numeric defaults to 20. The maximum number of days to include in the delay distribution. Computation scales non-linearly with this setting so consider what maximum makes sense for your data carefully. Note that this is zero indexed and so includes the reference date and max_delay - 1 other days.

missing_reference

Should entries for cases with missing reference date be completed as well?, Default: TRUE

Value

A data.table with completed entries for all combinations of reference dates, groups and possible report dates.

Examples

obs <- data.frame(
  report_date = c("2021-10-01", "2021-10-03"), reference_date = "2021-10-01",
  confirm = 1
)
enw_complete_dates(obs)
#>    report_date reference_date confirm
#> 1:  2021-10-01           <NA>       0
#> 2:  2021-10-02           <NA>       0
#> 3:  2021-10-03           <NA>       0
#> 4:  2021-10-01     2021-10-01       1
#> 5:  2021-10-02     2021-10-01       1
#> 6:  2021-10-03     2021-10-01       1
#> 7:  2021-10-02     2021-10-02       0
#> 8:  2021-10-03     2021-10-02       0
#> 9:  2021-10-03     2021-10-03       0