Skip to contents

This function is used internally by enw_preprocess_data() to combine various pieces of processed observed data into a single object. It is exposed to the user in order to allow for modular data preprocessing though this is not currently recommended. See documentation and code of enw_preprocess_data() for more on the expected inputs.

Usage

enw_construct_data(
  obs,
  new_confirm,
  latest,
  missing_reference,
  reporting_triangle,
  metareport,
  metareference,
  metadelay,
  by,
  max_delay
)

Arguments

obs

Observations with the addition of empirical reporting proportions and and restricted to the specified maximum delay.

new_confirm

Incidence of notifications by reference and report date. Empirical reporting distributions are also added.

latest

The latest available observations.

missing_reference

A data.frame of reported observations that are missing the reference date.

reporting_triangle

Incident observations by report and reference date in the standard reporting triangle matrix format.

metareport

Metadata for report dates.

metareference

Metadata reference dates derived from observations.

metadelay

Metadata for reporting delays produced using enw_delay_metadata().

by

A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling

max_delay

Numeric defaults to 20 and needs to be greater than or equal to 1 and an integer (internally it will be coerced to one using as.integer()). The maximum number of days to include in the delay distribution. Computation scales non-linearly with this setting so consider what maximum makes sense for your data carefully. Note that this is zero indexed and so includes the reference date and max_delay - 1 other days (i.e. a max_delay of 1 corresponds with no delay).

Value

A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:

  • obs: (observations with the addition of empirical reporting proportions and and restricted to the specified maximum delay).

  • new_confirm: Incidence of notifications by reference and report date. Empirical reporting distributions are also added.

  • latest: The latest available observations.

  • missing_reference: Observations missing reference dates.

  • reporting_triangle: Incident observations by report and reference date in the standard reporting triangle matrix format.

  • metareference: Metadata reference dates derived from observations.

  • metrareport: Metadata for report dates.

  • metadelay: Metadata for reporting delays produced using enw_delay_metadata().

  • time: Numeric, number of timepoints in the data.

  • snapshots: Numeric, number of available data snapshots to use for nowcasting.

  • groups: Numeric, Number of groups/strata in the supplied observations (set using by).

  • max_delay: Numeric, the maximum delay in the processed data

  • max_date: The maximum available report date.

Examples

pobs <- enw_example("preprocessed")
enw_construct_data(
  obs = pobs$obs[[1]],
  new_confirm = pobs$new_confirm[[1]],
  latest = pobs$latest[[1]],
  missing_reference = pobs$missing_reference[[1]],
  reporting_triangle = pobs$reporting_triangle[[1]],
  metareport = pobs$metareport[[1]],
  metareference = pobs$metareference[[1]],
  metadelay = enw_delay_metadata(max_delay = 20),
  by = c(),
  max_delay = pobs$max_delay[[1]]
)
#>                    obs          new_confirm              latest
#> 1: <data.table[671x9]> <data.table[630x11]> <data.table[41x10]>
#>     missing_reference  reporting_triangle      metareference
#> 1: <data.table[41x6]> <data.table[41x22]> <data.table[41x9]>
#>             metareport          metadelay time snapshots by groups max_delay
#> 1: <data.table[60x12]> <data.table[20x4]>   41        41         1        20
#>      max_date
#> 1: 2021-08-22