This function is used internally by enw_preprocess_data() to combine
various pieces of processed observed data into a single object. It
is exposed to the user in order to allow for modular data preprocessing
though this is not currently recommended. See documentation and code
of enw_preprocess_data() for more on the expected inputs.
Usage
enw_construct_data(
obs,
new_confirm,
latest,
missing_reference,
reporting_triangle,
metareport,
metareference,
metadelay,
max_delay,
timestep,
by
)Arguments
- obs
Observations with the addition of empirical reporting proportions and and restricted to the specified maximum delay.
- new_confirm
Incidence of notifications by reference and report date. Empirical reporting distributions are also added.
- latest
The latest available observations.
- missing_reference
A
data.frameof reported observations that are missing the reference date.- reporting_triangle
Incident observations by report and reference date in the standard reporting triangle matrix format.
- metareport
Metadata for report dates.
- metareference
Metadata reference dates derived from observations.
- metadelay
Metadata for reporting delays produced using
enw_metadata_delay().- max_delay
The maximum delay to model in the delay distribution, specified in units of the timestep (e.g., if
timestep = "week", thenmax_delay = 3means 3 weeks). If not specified the maximum observed delay is assumed to be the true maximum delay in the model. Otherwise, an integer greater than or equal to 1 can be specified. Observations with delays larger than the maximum delay will be dropped. If the specified maximum delay is too short, nowcasts can be biased as important parts of the true delay distribution are cut off. At the same time, computational cost scales non-linearly with this setting, so you want the maximum delay to be as long as necessary, but not much longer.Steps to take to determine the maximum delay:
Consider what is realistic and relevant for your application.
Check the proportion of observations reported (
prop_reported) by delay in thenew_confirmoutput ofenw_preprocess_obs.Use
check_max_delay()to check the coverage of a candidatemax_delay.If in doubt, check if increasing the maximum delay noticeably changes the delay distribution or nowcasts as estimated by
epinowcast. If it does, your maximum delay may still be too short.
Note that delays are zero indexed and so include the reference date and
max_delay - 1other intervals (i.e. amax_delayof 1 corresponds to no delay).- timestep
The timestep to used in the process model (i.e. the reference date model). This can be a string ("day", "week", "month") or a numeric whole number representing the number of days. If your data does not have this timestep then you may wish to make use of
enw_aggregate_cumulative()to aggregate your data to the desired timestep.- by
A character vector describing the stratification of observations. This defaults to no grouping. This should be used when modelling multiple time series in order to identify them for downstream modelling
Value
A data.table containing processed observations as a series of nested data.frames as well as variables containing metadata. These are:
obs: (observations with the addition of empirical reporting proportions and restricted to the specified maximum delay).new_confirm: Incidence of notifications by reference and report date. Empirical reporting distributions are also added.latest: The latest available observations.missing_reference: Observations missing reference dates.reporting_triangle: Incident observations by report and reference date in the standard reporting triangle matrix format.metareference: Metadata reference dates derived from observations.metrareport: Metadata for report dates.metadelay: Metadata for reporting delays produced usingenw_metadata_delay().max_delay: Maximum delay to be modelled by epinowcast.time: Numeric, number of timepoints in the data.snapshots: Numeric, number of available data snapshots to use for nowcasting.groups: Numeric, Number of groups/strata in the supplied observations (set usingby).max_date: The maximum available report date.
See also
Preprocessing functions
enw_add_delay(),
enw_add_max_reported(),
enw_add_metaobs_features(),
enw_assign_group(),
enw_complete_dates(),
enw_extend_date(),
enw_filter_delay(),
enw_filter_reference_dates(),
enw_filter_report_dates(),
enw_flag_observed_observations(),
enw_impute_na_observations(),
enw_latest_data(),
enw_metadata(),
enw_metadata_delay(),
enw_missing_reference(),
enw_preprocess_data(),
enw_reporting_triangle(),
enw_reporting_triangle_to_long()
Examples
pobs <- enw_example("preprocessed")
enw_construct_data(
obs = pobs$obs[[1]],
new_confirm = pobs$new_confirm[[1]],
latest = pobs$latest[[1]],
missing_reference = pobs$missing_reference[[1]],
reporting_triangle = pobs$reporting_triangle[[1]],
metareport = pobs$metareport[[1]],
metareference = pobs$metareference[[1]],
metadelay = pobs$metadelay[[1]],
max_delay = pobs$max_delay,
timestep = pobs$timestep[[1]],
by = c()
)
#> obs new_confirm latest
#> <list> <list> <list>
#> 1: <data.table[650x9]> <data.table[610x11]> <data.table[40x10]>
#> missing_reference reporting_triangle metareference
#> <list> <list> <list>
#> 1: <data.table[40x6]> <data.table[40x22]> <data.table[40x9]>
#> metareport metadelay max_delay time snapshots by
#> <list> <list> <num> <int> <int> <list>
#> 1: <data.table[59x12]> <data.table[20x5]> 20 40 40 [NULL]
#> groups max_date timestep
#> <int> <IDat> <char>
#> 1: 1 2021-08-22 day
