Skip to contents

Calculate delay metadata based on the supplied maximum delay and independent of other metadata or date indexing. These data are meant to be used in conjunction with metadata on the date of reference. Users can build additional features this data.frame or regenerate it using this function in the output of enw_preprocess_data().

Usage

enw_delay_metadata(max_delay = 20, breaks = 4, timestep = "day")

Arguments

max_delay

Numeric defaults to 20 and needs to be greater than or equal to 1 and an integer (internally it will be coerced to one using as.integer()). The maximum number of days to include in the delay distribution. Computation scales non-linearly with this setting so consider what maximum makes sense for your data carefully. Note that this is zero indexed and so includes the reference date and max_delay - 1 other days (i.e. a max_delay of 1 corresponds with no delay). If a max_delay greater than the maximum delay in the data is supplied then enw_preprocess_data() will throw a warning but in some cases this may be appropriate (e.g. if at the beginning of a time series). In these cases the user should check the model specification carefully as the model will be extrapolating beyond the observed data.

breaks

Numeric, defaults to 4. The number of breaks to use when constructing a categorised version of numeric delays.

timestep

The timestep to used. This can be a string ("day", "week", "month") or a numeric whole number representing the number of days.

Value

A data.frame of delay metadata. This includes:

  • delay: The numeric delay from reference date to report.

  • delay_cat: The categorised delay. This may be useful for model building.

  • delay_week: The numeric week since the delay was reported. This again may be useful for model building.

  • delay_head: A logical variable defining if the delay is in the lower 25% of the potential delays. This may be particularly useful when building models that assume a parametric distribution in order to increase the weight of the head of the reporting distribution in a pragmatic way.

  • delay_tail: A logical variable defining if the delay is in the upper 75% of the potential delays. This may be particularly useful when building models that assume a parametric distribution in order to increase the weight of the tail of the reporting distribution in a pragmatic way.

Examples

enw_delay_metadata(20, breaks = 4)
#>     delay delay_cat delay_week delay_head delay_tail
#>     <int>    <fctr>      <int>     <lgcl>     <lgcl>
#>  1:     0     [0,5)          0       TRUE      FALSE
#>  2:     1     [0,5)          0       TRUE      FALSE
#>  3:     2     [0,5)          0       TRUE      FALSE
#>  4:     3     [0,5)          0       TRUE      FALSE
#>  5:     4     [0,5)          0       TRUE      FALSE
#>  6:     5    [5,10)          0      FALSE      FALSE
#>  7:     6    [5,10)          0      FALSE      FALSE
#>  8:     7    [5,10)          1      FALSE      FALSE
#>  9:     8    [5,10)          1      FALSE      FALSE
#> 10:     9    [5,10)          1      FALSE      FALSE
#> 11:    10   [10,15)          1      FALSE      FALSE
#> 12:    11   [10,15)          1      FALSE      FALSE
#> 13:    12   [10,15)          1      FALSE      FALSE
#> 14:    13   [10,15)          1      FALSE      FALSE
#> 15:    14   [10,15)          2      FALSE      FALSE
#> 16:    15   [15,20)          2      FALSE       TRUE
#> 17:    16   [15,20)          2      FALSE       TRUE
#> 18:    17   [15,20)          2      FALSE       TRUE
#> 19:    18   [15,20)          2      FALSE       TRUE
#> 20:    19   [15,20)          2      FALSE       TRUE
#>     delay delay_cat delay_week delay_head delay_tail