Changelog • epinowcast

epinowcast 0.7.0

Model

The fixed-effects design and integrated (d >= 1) arima() and gp() residuals are now centred against the module intercept, decorrelating the intercept from the slopes and from the latent drift to improve sampling geometry. For modules with a free intercept (expr, refp mean, refnp, miss) the design is centred on its observation-weighted column means, as brms does by default, and the grand mean (over time and groups) of the integrated residual is removed. The sampled intercept (<prefix>_int_c) is on the centred scale; the original-scale intercept the prior applies to is recovered as <prefix>_int by undoing both the design and latent centring (a unit-Jacobian shift, so the prior keeps its meaning and the posterior is unchanged, as in EpiNow2’s reproduction-number centring). Only the shared grand-mean level is removed, so each group keeps its own level and drift: a grouped latent (arima(time, group, ...), G > 1) is unchanged in meaning and the reparameterisation is exact for any number of groups. On a weekly random-walk growth model the centred form samples roughly twice as fast at the adapt_delta these models use (it is sharper, so benefits from adapt_delta >= 0.95). Modules without a free intercept (expl, rep) and the log-link refp standard deviation are left uncentred.
Added a gp() formula helper that places an approximate Gaussian process on any module’s linear predictor, the same way arima() and rw() work. It uses a Hilbert-space reduced-rank (spectral) approximation with selectable kernels (Matern 3/2 default, Matern 5/2, Ornstein-Uhlenbeck, squared exponential, periodic) and a basis_prop accuracy-speed control. A gp() term can be placed on the growth rate (expr), the latent-to-obs proportion (expl), the parametric (refp) and non-parametric (refnp) reference delay, the report-time hazards (rep), and the missing-reference proportion (miss). An integer d argument (matching arima()’s d) integrates the process d times: d = 0 is stationary (the default, like EpiNow2’s gp_on = "R0"), d = 1 gives a smoothly drifting trend (like EpiNow2’s default gp_on = "R_t-1"), and d >= 2 integrates further, anchoring the first d values to zero so the level and slope are carried by the fixed effects. The Stan implementation is adapted from EpiNow2 (https://github.com/epiforecasts/EpiNow2, MIT licensed). See #824.
Added a delay-only model that fits the reporting-delay distribution conditional on known per-reference-date totals, treating those totals as fixed truth (the standard delay-estimation pattern of Kalbfleisch & Lawless, 1989; Höhle & an der Heiden, 2014). Enable it with enw_obs(delay_only = TRUE): a delay-only fit is just epinowcast(data, obs = enw_obs(delay_only = TRUE, data = data)), as epinowcast() minimises the (now inert) expectation automatically. The latent process and per-cell observation model are replaced by a (truncated) multinomial likelihood over the reported cells of each reference date. When the known totals are final retrospective totals this is the plain multinomial; when they are running totals observed only up to some horizon the likelihood renormalises over all delays up to the observation cutoff to give the truncated multinomial. An observation_indicator is supported (interior cells unobserved but before the cutoff keep their weight). delay_only = TRUE selects the multinomial likelihood internally regardless of family, warning if a family is supplied. See the delay estimation vignette and #775 and #776. Also adds enw_posterior_delay() to extract posterior samples of the parametric delay distribution; it returns one PMF per reference-design row (with a row column) for delay models with reference covariates, random effects, or time- or group-varying delays.
The parametric reference delay is now discretised with the double interval censoring approach from the primarycensored package, replacing the previous uniform-interval approximation. This more exactly accounts for primary event censoring, secondary interval censoring, and right truncation, and is used unconditionally for the lognormal, gamma, and exponential distributions. The log-logistic distribution has been dropped from enw_reference() because primarycensored does not yet support it (epinowcast/primarycensored#321); it can be restored once upstream support lands. The vendored Stan functions are generated from primarycensored by inst/dev/vendor-primarycensored.R and kept up to date by the check-primarycensored workflow, following the approach used by EpiNow2. The now-unused legacy discretisation Stan functions (discretised_logit_hazard() and its uniform-interval helpers) have been removed. See #848 (addressing #438 and #297) by @seabbs.
The autoregressive part of an arima() latent residual now takes an optional prior on its partial autocorrelations, set through each module’s <prefix>_arima_pacf entry (e.g. expr_arima_pacf). The default keeps the implicit Uniform(-1, 1) from the parameter bounds; a positive standard deviation switches to a Normal prior truncated to (-1, 1) for gentle shrinkage toward weaker autocorrelation.

Package

Lowered the minimum R version from 4.4.0 to 4.3.0 so users on R 4.3.x can install the package. No code in the package relies on features introduced in R 4.4. See #811 by @seabbs.

Documentation

Added a Gaussian process vignette (gaussian-process.Rmd) covering the Hilbert-space spectral approximation, the available kernels, the priors, and the modules a gp() term can be placed on, ported from EpiNow2’s implementation notes. See #824.
Added a temporal aggregation guide vignette covering the weekly timestep, daily-process / weekly-reporting (fitted and structural variants), and a daily benchmark, with weekly-scale CRPS comparison via scoringutils. Replaces the standalone scripts at inst/examples/germany_weekly_process_model.R and inst/examples/germany_weekly_reporting_daily_process_model.R. See #668 by @seabbs.

Bug fixes

Fixed a Stan dimension mismatch when the expectation, reference, report, or missing data formula has an intercept and a single numeric covariate (e.g., r = ~ 1 + week). The fixed-effects design matrix was collapsing to a vector after the intercept was dropped, causing Stan to error with mismatch in number dimensions declared and found in context. See #783 by @seabbs.
Fixed enw_report() recycling the report-date index (rep_findex) when the report axis is longer than time + max_delay - 1, for example after enw_complete_dates(completion_beyond_max_report = TRUE). Previously this emitted a “data length is not a sub-multiple or multiple” warning and mis-mapped report-date effects across groups and times; the number of report dates per group is now read from the report metadata. See #868.

epinowcast 0.6.0

This release prepares the package for CRAN submission and introduces new methods for inspecting epinowcast and preprocessed data objects, including print(), summary(), plot() and an enw_get_data() accessor. The getting started vignette has been restructured around a default-first flow that fits and compares three models — the default Poisson model and two alternatives with weekly growth rate, day-of-week effects, negative binomial observations, and contrasting parametric versus non-parametric delay specifications — to show that epinowcast is a modelling toolkit rather than a single model. A new inference methods vignette compares NUTS sampling, standalone pathfinder, and pathfinder used to initialise NUTS (seeding HMC from a pathfinder fit to speed up warmup), and a new data visualisation vignette demonstrates the plot.enw_preprocess_data() method. Support for max_delay = 1 means epinowcast can now be used for purely retrospective fitting of count data (e.g., Rt estimation from fully reported counts) without any nowcasting, alongside fixes to reference-date filtering.

Full details on changes in this release can be found in the following sections or in the GitHub release notes.

Contributors

@seabbs and @sbfnk contributed code to this release.

@seabbs reviewed pull requests for this release.

@seabbs, @sbfnk and @Bisaloo reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Breaking changes

Moved scoringutils from Imports to Suggests using delayed S3 method registration via @exportS3Method. Users now need to load scoringutils explicitly to use as_forecast_sample(). See #741.
enw_add_incidence() no longer filters reference dates that precede the earliest report date. Users should now call enw_filter_reference_dates_by_report_start() before enw_add_incidence() to reproduce the previous behaviour. All internal call sites have been updated. See #709 by @seabbs.

Package

Prepared the package for CRAN submission, including moving cmdstanr from Imports to Suggests, removing the Remotes entry for scoringutils, fixing URLs and removing the CmdStan SystemRequirements. See #736 and #747.
Updated install instructions to treat cmdstanr as an optional dependency. See #740.
Added enw_get_data() accessor along with print() and summary() methods for epinowcast objects. See #750.
Improved the epinowcast print() method to show model objects and the maximum tree depth reached during sampling. See #755.
Added a plot.enw_preprocess_data() method for quickly visualising preprocessed data. See #757.
Added enw_obs_at_delay() helper function for extracting observations at a given delay. See #707.
Solved linting issues (implicit returns) in multiple files and fixed remaining violations in R/model-tools.R. See #715 and #717.

Model

Optimised Stan code for efficiency by inlining intermediate variables and removing unnecessary loop guards. See #696 by @seabbs.
Added support for max_delay = 1 so that epinowcast can be used for retrospective Rt estimation without nowcasting. See #759 closing #742.

Documentation

Restructured the getting started vignette around a default-first flow that fits and overlays three models: the default Poisson model with all defaults, and two alternatives with a weekly growth rate expectation, day-of-week report effects and negative binomial observations that differ in their delay specification (parametric lognormal with day-of-week effects versus a non-parametric intercept plus random walk over delay weeks). A new data visualisation section precedes the modelling to motivate model choices. See #749.
Added an inference methods comparison vignette covering NUTS sampling, standalone pathfinder, and pathfinder used to initialise NUTS (a warmup-speedup pattern where the pathfinder fit seeds HMC), with runtime, diagnostic, and posterior parameter comparisons across two model specifications; linked from the features table and pkgdown navigation. See #751 and #777.
Added a data visualisation vignette demonstrating the new plot.enw_preprocess_data() method and its five plot types (observations, cumulative empirical reporting delay, delay fraction heatmap, delay quantiles, and stacked delay timeseries). See #757.
Updated the single-timeseries Rt estimation and main epinowcast vignettes. See #723 and #744.
Clarified how to specify custom priors and updated vignette source files to include prior specification guidance. See #710 and #721.
Refactored vignette headers to use a shared YAML configuration. See #711.
Fixed a citation formatting error. See #705.
Updated the README. See #738, #746 and #771.

Infrastructure

Added a full package linting GitHub Action. See #704.
Changed benchmarking to be comment-triggered and fixed the benchmarks comment workflow skipping correctly. See #708 and #720.
Fixed merge queue concurrency groups cancelling each other. See #713.
Bumped GitHub Actions and Docker action dependencies. See #714, #730, #731, #732, #733 and #760.

epinowcast 0.5.0

This release includes minor improvements to the package infrastructure and documentation.

Contributors

@seabbs and @Bisaloo contributed code to this release.

@seabbs reviewed pull requests for this release.

@seabbs and @Bisaloo reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Package

Exported enw_rolling_sum() function for aggregating observations over timesteps. This function was previously internal but is needed for users working with non-daily reporting cycles. See #528 by @seabbs and self-reviewed.
Updated minimum R version to 4.4.0 to align with Matrix package.
Updated GitHub Actions to use latest versions (checkout v6, upload-artifact v5).

Model

Added support for structural reporting patterns in enw_report() via new helper functions enw_structural_reporting_metadata() and enw_dayofweek_structural_reporting(). This enables modelling of non-daily reporting cycles (e.g., weekly) with a daily underlying generative model. Stan optimisations include precomputing sparse index lookups for aggregation operations to improve computational efficiency and numerical stability. See #528 by @seabbs and self-reviewed.

Bugs

Fixed difftime vs integer comparison in test-subset_obs.R. See #692.
Fixed lintr 3.2.0 violations to ensure code quality standards are maintained. See #670.

Documentation

Updated features vignette to reflect that structural reporting schedules are now supported rather than in development.

epinowcast 0.4.0

This release adds a new use cases vignette to help users understand when and how to apply the package to different problems. Documentation has been enhanced with clearer guidance on the formula interface, including details on fixed effects, random effects, and random walks, making the package more accessible to users unfamiliar with formula syntax.

Performance improvements include optimised Stan functions, support for sparse design matrices for memory-intensive models, and tightened priors to improve run times. Experimental pathfinder support has been added for rapid prototyping and informing initialisation of HMC runs. Model enhancements include a negative binomial observation model with linear mean-variance relationship and improved probability aggregation support.

The package lifecycle has been updated from experimental to stable, with interface stability expected in this version.

The release has a single breaking change that fixes an off-by-one error in enw_filter_reference_dates() where include_days = n incorrectly returned n + 1 dates. Functions deprecated at version 0.4.0 or earlier have been removed.

A range of bug fixes have been implemented, including fixes for enw_aggregate_cumulative() counting when the maximum delay is an even multiple of the timestep and IDate storage mode compatibility with dplyr workflows.

Full details on changes in this release can be found in the following sections or in the GitHub release notes.

Contributors

@athowes, @kaitejohnson, @jamesmbaazam, @jessalynnsebastian, @Bisaloo, @barbora-sobolova and @seabbs contributed code to this release.

@medewitt, @jessalynnsebastian, @pearsonca, @jamesmbaazam, and @seabbs reviewed pull requests for this release.

@pearsonca, @jessalynnsebastian, @athowes, @medewitt, and @seabbs reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Breaking changes

Fixed off-by-one error in enw_filter_reference_dates() where include_days = n incorrectly returned n + 1 dates instead of exactly n dates. Now include_days = 10 returns exactly 10 reference dates, not 11. This brings the function behaviour in line with its documentation and user expectations. Users relying on the previous behaviour will need to adjust their include_days arguments by subtracting 1 to maintain the same date range. See issue #352 for details.
Removed deprecated functions scheduled for removal at version 0.4.0 or earlier:
- enw_cumulative_to_incidence() (deprecated 0.2.1, use enw_add_incidence())
- enw_incidence_to_cumulative() (deprecated 0.2.1, use enw_add_cumulative())
- enw_delay_filter() (deprecated 0.2.3, use enw_filter_delay())
- enw_delay_metadata() (deprecated 0.2.3, use enw_metadata_delay())
- enw_score_nowcast() (deprecated 0.4.0, use as_forecast_sample())
Removed simulate_double_censored_pmf(). Users should use primarycensored::dprimarycensored() instead for generating double censored PMFs.

Bugs

Fixed enw_aggregate_cumulative() incorrectly counting when max_delay is an even multiple of the timestep. The function now completes dates before aggregation and adds missing incidence counts after aggregation to ensure cumulative sums are correctly calculated. Fixes #511.
Fixed IDate storage mode error when using dplyr::filter() before enw_preprocess_data(). The coerce_dt() function now explicitly restores integer storage mode for IDate columns that may have been converted to double storage by dplyr operations whilst preserving the IDate class. This ensures compatibility with both dplyr and data.table workflows. Fixes #557.
Fixed a bug where enw_nowcast_summary() and enw_nowcast_samples() incorrectly selected reference dates to include in their outputs when time steps were not days. See #473 by @jessalynnsebastian and reviewed by @seabbs.
Fixed a bug where enw_expose_stan_fns() which has been deprecated was being used in the stan docs for expected_obs(). See #488 by @seabbs and reviewed by @jessalynnsebastian.
Fixed error in enw_preprocess_data() when processing data with predominantly zero counts across multiple groups. The function now handles sparse data gracefully and provides informative warnings when delay coverage statistics cannot be computed. See #541 by @seabbs and self-reviewed.
Fixed stacked bar chart in Rt estimation vignette extending beyond actual reference date range. The plot now correctly limits the x-axis to the range of dates with data. See #634 by @seabbs and self-reviewed.

Package

The package lifecycle has been updated from experimental to stable. The current interface has stabilised and users can expect interface stability in this version, though future major versions may include interface changes. See #370 by @seabbs.
Experimental support for CmdStanModel$pathfinder has been added to the package via enw_pathfinder(). This fitting method approximates the posterior distribution using a variational inference method. It may be useful for rapid prototyping, informing initialisation of HMC runs, and settings where compute time is limited. Likely downsides are poorly calibrated estimates and instability for more complex model formulations. See #464 by @seabbs and reviewed by @medewitt.
Added support for initialising methods in enw_sample(). Currently this allows either the default “random” which draws from the priors (previously the only option) or “pathfinder” which approximates the posterior distribution using the pathfinder variational inference method. Currently this does not support initialising the mass matrix for HMC but will do once support is available in cmdstan. See #504 by @seabbs and reviewed by @jamesmbaazam.
Added checks for partial argument matching and fixed all instances. See #343 by @Bisaloo and reviewed by @seabbs.
Support for probability aggregation has been added to expected_obs(). See #482 by @jessalynnsebastian and reviewed by @seabbs.
Added actions to build precompiled actions both when updated and pushed to main and on a schedule. This aims to avoid issues where the precompiled actions are not up to date with the latest changes. See #494 by @seabbs and reviewed by @sbfnk.
A new interface has been added to scoringutils to allow for scoring nowcasts. This is now available in epinowcast via as_forecast_sample(). See #550 by @seabbs and self-reviewed.

Model

Performance tuned expected_obs() and related functions to improve speed and reduce memory usage. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Used the offset and multiplier stan translation functions to improve the speed of the model. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Removed normalisation of truncated priors as this is not required during inference and increases run time. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Tightened the prior on the overdispersion parameter to provide less support to extreme overdispersion. This change is unlikely to impact results for most users but should help to improve run time. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Optimised internal performance critical stan functions to improve speed and reduce memory usage. See #513 by @seabbs and reviewed by @pearsonca.
Added support for sparse design matrices to the model (see sparse_design in enw_fit_opts()). For very sparse design matrices this can reduce memory requirements and computation time. A heuristic has been added to inform users if sparse design matrices are useful for you. See #514 by @seabbs and reviewed by @pearsonca.
Added support for a negative binomial observation model with a linear mean-variance relationship as an option of the model_obs argument of epinowcast(). See #590 by @barbora-sobolova and reviewed by @seabbs.

Documentation

Clarified in enw_aggregate_cumulative() documentation that observations where report dates do not form a complete timestep will be dropped from the aggregated output. This behaviour is by design to ensure consistent timestep alignment. Addresses #427.
Improved documentation of the formula interface to make it more accessible to users unfamiliar with formula syntax. Enhanced enw_formula() documentation with details on fixed effects, random effects (lme4 syntax), and random walks, including explanation of the ~0 convention for disabling model components and how formulas map to model structure. Added references to relevant R resources and expanded examples. Updated model module documentation (enw_reference(), enw_report(), enw_expectation(), enw_missing()) to clarify formula usage and cross-reference the main formula documentation. Addresses #468 by @seabbs and self-reviewed.
Linked the Stan function documentation to the package website. By @jamesmbaazam in #529 and reviewed by @seabbs.
Added support to render and deploy stan documentation using doxygen and a GitHub Actions workflow. See #500 and #502 by @jamesmbaazam and @seabbs respectively, and cross-reviewed.
Standardised punctuation in the pkgdown reference. See #458 by @athowes and reviewed by @seabbs.
Reduced the adapt_delta and max_treedepth arguments in the vignettes and examples and tested to see that this did not impact the results. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Made vignette progress messaging dependent on the user being interactive. See #501 by @seabbs and reviewed by @jessalynnsebastian.
Added a vignette to document package use cases. See #524 by @kaitejohnson and reviewed by @seabbs.
Switched to using primarycensored for simulating the primary censored and right truncated processes needed to correctly model the discrete delays. See #549 by @seabbs and self-reviewed.

Deprecations

enw_score_nowcast() has been deprecated in favour of scoringutils::score(). See #550 by @seabbs and self-reviewed.

epinowcast 0.3.0

This release brings a range of enhancements, new features, and bug fixes, reflecting the effort of a large number of contributors. It has a single breaking change, which adjusts the default max_delay parameter in enw_process_data() to be the maximum observed delay in the input data. This change aims to encourage users to tailor this setting to their specific datasets and to give them a more reasonable default if they do not.

The package infrastructure has also had significant updates, including improved search functionality on the pkgdown website, the adoption of an organization-level pkgdown theme, the ability to cache Stan models across R sessions, and additional continuous integration tests.

Model enhancements include updated internal handling of PMF discretization and support for non-parametric reference date models, alongside documentation improvements that provide clearer guidance and examples for users.

A range of bug fixes have been implemented, including a fix for a bug in the enw_expectation() module that was causing issues with models containing multiple time series.

Full details on the changes in this release can be found in the following sections or in the GitHub release notes. To see the development timeline of this release see the 0.3.0 project.

Contributors

@jamesmbaazam, @medewitt, @sbfnk, @adrian-lison, @kathsherratt, @natemcintosh, @Bisaloo and @seabbs contributed code to this release.

@jamesmbaazam, @adrian-lison, @sbfnk, @bisaloo, @pearsonca, @natemcintosh, and @seabbs reviewed pull requests for this release.

@jbracher, @medewitt, @kathsherratt, @jamesmbaazam, @zsusswein, @TimTaylor, @sbfnk, @natemcintosh, @pearsonca, @bisaloo, @parksw3, @adrian-lison, and @seabbs reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Breaking changes

The default of max_delay in enw_process_data() has been changed to be the maximum observed delay in the input data rather than being 20 days. When this default is used a warning is now thrown and in general users should be setting this based on their data and application. See the documentation of enw_preprocess_data() for more details. See #224 by @adrianlison and reviewed by @seabbs.

Bugs

Fixed a bug identified by @jbracher where the enw_expectation() module was not appropriately defining initial conditions when multiple groups were present. This issue was related to recent changes in cmdstan 2.32.1 and is required in order to use versions of cmdstan beyond 2.32.0 with models that contain multiple time series. See #282 by @seabbs and self-reviewed.
Fixed a few typos in the model vignette. See #292 by @medewitt and reviewed by @seabbs.
Fixed a bug where snapshots (i.e. as returned as metadata in enw_preprocess_data()) were defined based on report vs reference date. This won’t have impacted most usage but was a problem when trying to fit a model to retrospective (and so completely reported) data. See #312 by @seabbs and self-reviewed.
Fixed a bug where a non-data.table passed to enw_quantile_to_long() could throw an error. See #324 by @natemcintosh and reviewed by @pearsonca.
Fixed a bug where enw_aggregate_cumulative() initialised its time step from the first reference date + 1 day rather than the first reference date. See #336 by @seabbs and self-reviewed.
Fixed a bug in enw_filter_reference_dates when using remove_days on data with missing reference dates. See #351 by @adrian-lison and reviewed by @seabbs.
Resolved code quality issues related to the use of %in% with a scale on the right hand side and other similar issues. See #382 by @seabbs and reviewed by @pearsonca and @Bisaloo.

Package

Search functionality on pkgdown website no longer directs to non-existent pages. This issue resulted from an incorrect URL being specified in the pkgdown configuration file. See #449 by @Bisaloo, based on a report from @zsusswein.
pkgdown theming elements have moved to an organization-level pkgdown theme to increase re-usability and DRY-ness across the organization. See #419 by @Bisaloo and reviewed by @pearsonca and @seabbs.
lintr checks are now run also on the tests/ directory. See #418 by @Bisaloo and reviewed by @seabbs.
Fixed some typos in single-timeseries-rt-estimation.Rmd. The WORDLIST used by spelling has also been updated to eliminate false positives. Future typos will now generate an error in the continuous integration check so that we can catch them as early as possible. See #341 by @Bisaloo and reviewed by @seabbs.
Added extra checks in continuous integration tests: we now test that partial matching is not used and that global state is left unchanged (or restored correctly). See #338 by @Bisaloo and reviewed by @seabbs.
Added additional tests to ensure that the enw_expectation() module is appropriately defining initial conditions when multiple groups are present. See #282 by @seabbs and self-reviewed.
Added an integration test for epinowcast() to check models with multiple time series can be fit as expected on example data. See #282 by @seabbs and reviewed by @adrian-lison.
Added a {touchstone} benchmark that includes multiple time-series to ensure that this functionality is appropriately tested. See #282 by @seabbs and reviewed by @adrian-lison.
Added the merge_group option to all required GitHub Actions. This enables the use of a merge queue for pull requests. See #300 by @seabbs and self-reviewed.
Added an internal check_group_date_unique() function which ensures that user supplied groups result in unique combinations of group and dates. This function is used in enw_preprocess_data() and enw_complete_dates() to ensure that the user supplied groups are valid. See #295 by @adrian-lison and reviewed by @seabbs.
Added support for non-daily reference date models (i.e., process models). For example, this allows modelling weekly data as weekly. This may be desirable when delays are very long, when computational resources are limited, or it is not possible to specify a sufficiently flexible daily model to account for observed reporting patterns in either reference or report dates. As the model is unit less this entails no changes to the model itself. See #303 by @seabbs and self-reviewed.
Added a new helper function simulate_double_censored_pmf() which helps users define “correct” probability mass functions for double censored delays based on work in epidist by @parksw3 and @seabbs. Note this function is likely to be spun out into its own package in the near future. See #312 by @seabbs and self-reviewed.
Added a min_reference_date argument to enw_aggregate_cumulative() to allow users to specify the minimum reference date to include in the output. This is useful when users want to aggregate to a timestep with a specified initialisation date that is not the default. For example if users data is already reported with a weekly cadence they would use min(data$report_date) + 1 to preserve that timestep. See #340 by @seabbs and reviewed by @natemcintosh.
Added support to enw_complete_dates() for min_date and max_date arguments. These arguments allow users to specify the minimum and maximum dates to include in the output. This may be useful to users who want to ensure that their data is complete for a specified time period. See #340 by @seabbs and reviewed by @natemcintosh.
Added a new helper function enw_one_hot_encode_feature() for one hot encoding variables and binding them into the original data. This is useful when users want to include parts of variables in their models as binary indicators - for example giving a specific delay its own effect. See #348 by @seabbs and self-reviewed.
Enabled compiling with multithreading by default as this was found to cause no deterioration in performance even with 1 thread per chain. The likelihood calculation is now no longer parallelised when threads_per_chain = 1 which should offer a small performance improvement. See #366 by @sbfnk and reviewed by @seabbs.
Added a new action to check that the cmdstan model can be compiled and has the correct syntax. This runs on pull requests whenever stan code is changed, when code is merged onto main with altered stan code, and on a weekly schedule against the latest main branch. See #386 by @seabbs.
Switched to the cli package for all package messaging in order to have modern and pretty notifications. See #188 by @nikosbosse and @seabbs reviewed by @pearsonca.
Increased the minimum supported R version to >= R 3.6.0 from R 3.5.0 and ensured that existing function code and tests compiled with this dependency. Vignettes will continue to allow use of R >= 4.1.0 syntax (i.e., native pipe and lambda function syntax). See #389 by @medewitt and @seabbs and reviewed by @pearsonca.
Add documentation for all custom stan functions. See #422 by @seabbs and reviewed by @sbfnk.
Added a function check_max_delay() which allows to obtain coverage statistics for the assumed maximum delay based on the observed data. Enhanced postprocessing functions to accept a different max_delay than used in the model, by adding artificial samples/summaries for not-modeled dates. Further improved documentation and warnings around max_delay. See #224 by @adrian-lison and @seabbs and reviewed by @seabbs.
Exposed enw_stan_to_r() to the user. This function is used for testing and in development to expose epinowcast stan code in R. Users may find this function useful as it allows them to explore the stan code used in epinowcast models more easily. Note that this functionality is known to be unstable when rstan is loaded in the same R session. See #431 by @seabbs and reviewed by @sbfnk.
Refactored extract_sparse_matrix() to allow us to drop our rstan dependency. See #431 by @seabbs and reviewed by @sbfnk.
Allow for caching Stan models across R sessions to reduce compilation time through the use of the environment variable, enw_cache_location, which can be set using the set_enw_cache() function. See #407 by @medewitt and @seabbs and reviewed by @sbfnk and @pearsonca.

Model

Update the internal handling of PMF discretisation to assume a uniform window of two days centred on the delay of interest rather than a window of one day starting on the delay of interest. This better approximates the underlying continuous distribution with primary and secondary event censoring. Due to this change models may perform slightly differently between versions and any delay distribution estimates will have means that are half a day longer (note this corrects the previous bias). See #288 by @seabbs and reviewed by @adrian-lison.
Updated the default prior for initialising the model to include the ascertainment rate which is inferred from the latent reporting delay distribution as this can be an improper probability mass function (i.e. one that does not sum to 1). See #312 by @seabbs and self-reviewed.
Added support for non-parametric reference date models as well as mixed models with both parametric and non-parametric reference date models. This enables the use of popular models such as the discrete time cox proportional hazards model. See #313 by @seabbs and self-reviewed.
Added support for missing data (excluding in the missing reference date model) using the observation_indicator argument to enw_obs(). Support was also added to enw_complete_dates() to flag missing data and as part of this new helper functions (enw_flag_observed_observations() and enw_impute_na_observations()) were also added. This support is likely most useful when used in conjunction to a known reporting structure. See #327 by @seabbs and self-reviewed.
Added support for using a maximum delay that is longer than the largest observed delay in the data. This may be useful at the start of an outbreak, when the data is sparse and the user expects delays longer than what has been observed so far. Note that because this requires extrapolating the delay distribution beyond the support of the data, users should be cautious when using this feature. A new example, inst/examples/germany_max_delay_greater_than_data.R, has been added to demonstrate this feature. See #346 by @seabbs and reviewed by @adrian-lison.
Added the priors used for model fitting to the <epinowcast> object. The object returned by epinowcast() now has a variable called priors and can be accessed for inspection and downstream analyses. See #399 by @jamesmbaazam and reviewed by @pearsonca and @seabbs.

Documentation

Updated the distributions vignette to match the updated handling of discretisation. See #288 by @seabbs and reviewed by @adrian-lison.
Updated the use of the citation() function in the README so that the command is shown to users and the output is treated like normal text. See #272 by @seabbs and self-reviewed.
Added a vignette walking through how to estimate the effective reproduction number in real-time (and comparing this to retrospective estimates) on a data source that is right truncated. See #312 by @seabbs and self-reviewed.
Switched to using bookdown for pkgdown vignettes and moved to the flatly theme for pkgdown rather than the preferably theme. See #312 by @seabbs and self-reviewed.
Updated the README to include the non-parametric reference date model as an option and also added a new example showing how to use this model. See #313 by @seabbs and self-reviewed.
Added a new example showcasing how to fit a model to data reported weekly with a 3 day delay until any reports are non-zero with a weekly process model and a mixture of a parametric and non-parametric reference date model. See #348 by @seabbs and self-reviewed.
Split README to focus on package-level issues and moved quick start into a getting started vignette. See #375 by @pearsonca and reviewed by @jamesmbaazam and @seabbs.
Added code in the CITATION file to automatically pull relevant citation fields from the DESCRIPTION file. Also added a GitHub Actions workflow to auto-generate a citation.cff file whenever CITATION or DESCRIPTION change. This way, all three files will always be up to date. See #369 by @jamesmbazam and reviewed by @seabbs.
Removed the reference in the pull request template to updating the development version as this has been found to cause issues when multiple pull requests are open at once. See #391 by @seabbs and reviewed by @Bisaloo.
Added a note to the Getting Started vignette to clarify usability with alternatives to data.table. See #406 by @kathsherratt and reviewed by @seabbs.
Added a new vignette to provide users with a configuration and troubleshooting guide for Stan while working with epinowcast. See #405 by @medewitt and reviewed by @seabbs, @zsusswein, and @pearsonca.
Removed named individuals from vignettes and moved to team authorship. See #421 by @seabbs and self-reviewed.
Improved documentation of the maximum delay in the stan code. See #425 by @adrianlison and reviewed by @seabbs.

Deprecations

enw_delay_filter(): Deprecated with a warning in favour of enw_filter_delay(). This renaming is to better reflect the function’s purpose. See #365 by @kathsherratt and reviewed by @seabbs.

epinowcast 0.2.2

This is a minor release that fixes a bug in the handling of optional initial conditions that was introduced by a recent change in cmdstan 2.32.1. Upgrading is recommended for all users who wish to use versions of cmdstan beyond 2.32.0. In addition to fixing this issue, the release also includes some minor documentation and vignette improvements, along with enhancements in input checking.

Contributors

@sbfnk and @seabbs contributed code to this release.

@seabbs reviewed pull requests for this release.

@sbfnk and @seabbs reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Bugs

Improved the handling of optional initial conditions so that they are consistently passed as arrays to stan as required by cmdstan 2.32.1. This fix is required in order to use versions of cmdstan beyond 2.32.0. See #276 by @seabbs and self-reviewed.

Package

Added input checking for max_delay in enw_preprocess_data() to ensure that the maximum delay is greater than or equal to 1 and that it can be coerced to be an integer. See #274 by @sbfnk and reviewed by @seabbs.

Documentation

Improved the discrete delay distributions vignette including escaping functions to improve readibility and right-closing discretised bins. See #275 by @sbfnk and reviewed by @seabbs.
Improved the documentation for max_delay in enw_preprocess_data() and fixed a typo in the same documentation. See #274 by @sbfnk and reviewed by @seabbs.

epinowcast 0.2.1

In this release, we focused on improving the internal code structure, documentation, and development infrastructure of the package to make it easier to maintain and extend functionality in the future. We also fixed a number of bugs and made some minor improvements to the interface. These changes included extending test and documentation coverage across all package functions, improving internal data checking and internalization, and removing some deprecated functions.

While these changes are not expected to impact most users, we recommend that all users upgrade to this version. We also suggest that users who have fitted models with both random effects and random walks should refit these models and compare the output to previous fits in order to understand the impact of a bug in the specification of these models that was fixed in this release.

This release lays the groundwork for planned features in 0.3.0 and 0.4.0 including: support for non-parametric delays, non-daily data with a non-daily process model (i.e. weekly data with a weekly process model), additional flexibility specifying generation times and latent reporting delays, improved case studies, and adding support for forecasting.

Full details on the changes in this release can be found in the following sections or in the GitHub release notes. To see the development timeline of this release see the 0.2.1 project.

Contributors

@adrian-lison, @Bisaloo, @pearsonca, @FelixGuenther, @Lnrivas, @seabbs, @sbfnk, and @jhellewell14 made code contributions to this release.

@pearsonca, @Bisaloo, @adrian-lison, and @seabbs reviewed pull requests for this release.

@Gulfa, @WardBrian, @parkws3, @adrian-lison, @Bisaloo, @pearsonca, @FelixGuenther, @Lnrivas, @seabbs, @sbfnk and @jhellewell14 reported bugs, made suggestions, or contributed to discussions that led to improvements in this release.

Potentially breaking changes

enw_add_pooling_effect(): replaced string argument with ... argument, to enable passing arbitrary arguments to the finder_fn argument. The same general usage is supported, but now e.g. the default argument to supply is prefix = "somevalue" vs string = "somevalue" and argument positions have changed. This function is primarily for internal use and we expect only a small subset of advanced users who are creating models outside the currently supported formula interface to be impacted See #222 by @pearsonca and reviewed by @seabbs.
enw_dates_to_factors(): Deprecated and removed as no longer needed. We expect this function had little to no external use and so there should be little impact on users. See #216 by @seabbs and reviewed by @adrian-lison.

Bugs

Fixed a bug first highlighted by @Gulfa in #166 and localised during the investigation for #223 where random effects and random walks were being improperly constructed in enw_formula() so that their variances parameters were not shared between the correct parameters when used together. This only impacts models that used formulas with both random effects and random walks and for these models appears to have led to increased run-times, fitting issues, and potentially unreliable posterior estimates but to have had a less significant impact on actual nowcasts. We suggest refitting these models and comparing the output to previous fits in order to understand the impact on your usage. See #228 by @seabbs and self-reviewed.
Fixed a bug in enw_replace_priors() where the function could not deal with epinowcast summarised posterior estimates due to the new use of the pillar class. Added tests to catch if this issue reoccurs in the future. See #228 by @seabbs and self-reviewed.
Fixed an issue (#198) with the interface for scoringutils. For an unknown reason our example data contained pillar classes (likely due to an upstream change). This caused an issue with internal scoringutils that was using implicit type conversion (see here). See #201 by @seabbs and reviewed by @pearsonca.
Fixed a bug in enw_plot_quantiles() where the documented default for log was FALSE but the actual default was TRUE. See #209 by @seabbs and self-reviewed.
Fixed a bug in enw_expectation() where when models were specified with zero intercept a initial condition was still being specified for the intercept of the growth rate (expr_r_int, #246). This was not flagged as an issue by cmdstan 2.31.0 but as of cmdstan 2.32.0, due to improvements in how initial conditions were being read in (stan-dev/stan#3182), it throws an error causing models to fail. Solution suggested by @WardBrian, implemented in #255 by @seabbs, and reviewed by @pearsonca.

Deprecations

enw_incidence_to_cumulative(): Deprecated with a warning in favour of enw_add_cumulative(). This renaming is to better reflect the function’s purpose. enw_incidence_to_cumulative() will be removed in 0.3.0. See #247 by @seabbs and reviewed by @pearsonca.
enw_cumulative_to_incidence(): Deprecated with a warning in favour of enw_add_incidence(). This renaming is to better reflect the function’s purpose. enw_cumulative_to_incidence() will be removed in 0.3.0. See #247 by @seabbs and reviewed by @pearsonca.

Package

Fixed some typos in README.md, NEWS.md, the model.Rmd vignette and convolution_matrix() documentation. The WORDLIST used by spelling has also been updated by eliminate false positives. See #221 by @Bisaloo and reviewed by @seabbs and @adrian-lison.
Added more non-default linters in .lintr configuration file. This file is used when lintr::lint_package() is run or in the new lint-changed-files.yaml GitHub Actions workflow. See #220 by @Bisaloo and reviewed by @pearsonca and @seabbs.
Switched to the lint-changed-files.yaml GitHub Actions workflow instead of the regular lint.yaml to avoid annotations unrelated to the changes made in the PR. See #220 by @Bisaloo and reviewed by @pearsonca and @seabbs.
Added tests for summary.epinowcast() and plot.epinowcast() methods. See #209 by @seabbs and reviewed by @pearsonca.
Added tests for enw_plot_obs() where not otherwise covered by plot.epinowcast() tests. See #209 by @seabbs and reviewed by @pearsonca.
Refactored to consolidate data checking and internalization into a single internal function coerce_dt(), addressing issues #242, #241, #214, and #149. This eliminates the need for add_group(), check_by(), and check_dates() (and associated documentation, tests - some of these were intermediate capabilities introduced within this minor version; see #208) which have all been removed. Also starts to enable internal versus external use of exposed methods with the copy = ... argument. See #239 by @pearsonca, reviewed by @seabbs.
Resolved the spurious test warnings for snapshot tests which were linked to unstated formatting requirements. See #208 by @seabbs and reviewed by @pearsonca.
Removed unused internal plot helpers. See #217 by @seabbs and reviewed by @adrian-lison.
Added tests for all internal check_ functions used to check inputs. See #217 by @seabbs and reviewed by @adrian-lison.
Removed the problematic double specification of default arguments for target_date in enw_metadata() as flagged in #212 by @pearsonca using formals() to instead detect the default values from the function specification. See #232 by @seabbs and self-reviewed.
In the words of Jenny Bryan: “there is no else, there is only if.” Having else after return() of stop() increases the number of branches in the code, which makes it harder to read. It also translates into a higher cyclomatic complexity. We have removed all else statements after return() and stop() in the package. See #229 by @Bisaloo and reviewed by @seabbs.
Removed the internal definition of no_contrasts in enw_formula() as this was unused. Identified by @bisaloo in #220 and raised in #223. See #228 by @seabbs and self-reviewed.
Added tests for enw_replace_priors() to check that it can handle epinowcast summarised posterior estimates. See #228 by @seabbs and self-reviewed.
Added a prefix (rw__) in enw_formula() and construct_rw() to indicate when a random effect variance is a random walk versus a random effect. See #228 by @seabbs and reviewed by.
Added support for using the same variable as both a random effect and a random walk. In most settings this is not advised. See #228 by @seabbs and self-reviewed.
Added an error message to construct_rw() when a random walk is specified for a variable that is not a numeric variable. See #228 by @seabbs and self-reviewed.
Added support for preprocessing and model fitting benchmarking using touchstone based on the implementation in EpiNow2 by @sbfnk. See #200 by @seabbs, @adrian-lison, @sbfnk, and self-reviewed.
Added a complete set of data converters to map between line list (i.e. each row is a case) and count data (i.e incidence and cumulative counts by reference and report date). In particular, this will help workflows where individual line list data is available as it can now be formatted ready for preprocessing using a single call to enw_linelist_to_incidence() which previously took several steps. See #247 by @seabbs and @jhellewell14 and reviewed by @pearsonca.
Dropped the use of the develop branch for development versions of the package. This change was discussed in #250 with the major motivator being that since the introduction of release only builds to R Universe we no longer need to have a stable main branch of GitHub to control our releases. See #256 by @seabbs and reviewed by @Bisaloo and @pearsonca.
Cleaned enw_formula_as_data_list() to better align with DRY principles. See #245 by @Lnrivas, reviewed by @pearsonca, @Bisaloo, and @seabbs.

Documentation

Added examples for summary.epinowcast() and plot.epinowcast() methods to the documentation. See #209 by @seabbs and reviewed by @pearsonca.
Extended documentation, examples, and tests for internal, preprocessing, and postprocessing functions. See #208 by @seabbs and reviewed by @pearsonca.
Added examples for all plot functions. See #209 by @seabbs and reviewed by @pearsonca.
Added an example for enw_replace_priors() showing how to use a nowcast posterior to update the default priors. See #228 by @seabbs and self-reviewed.
Updated the package citation and documentation to include all new authors as of the 0.2.1 release and to use the recommended bibentry() approach. See #236 and #237 by @seabbs and reviewed by @Bisaloo.
Added a package style guide (STYLE_GUIDE.md) to document the style conventions used in the package. See #64 by @seabbs and reviewed by @pearsonca and @Bisaloo.
Improved and extended documentation of discretized, parametric delay distributions. Changed structure of package vignettes (into two categories, model definition vignettes and case study vignettes). See #265 by @FelixGuenther and @adrian-lison and reviewed by @seabbs.
Improved and extended the README quick start after feedback from @parksw3 in #260. See #267 by @seabbs and reviewed by @adrian-lison and @parksw3.

epinowcast 0.2.0

This release adds several extensions to our modelling framework, including modelling of missing data, flexible modelling of the generative process underlying case counts, an optional renewal equation-based generative process (enabling direct estimation of the effective reproduction number), and convolution-based latent reporting delays (enabling the modelling of both directly observed and unobserved delays as well as partial ascertainment). Much of the methodology used in these extensions is based on work done by Adrian Lison and is currently being evaluated.

On top of model extensions this release also adds a range of quality of life features, such as a helper functions for constructing convolution matrices and combining probability mass functions. It also comes with improved computational efficiency, thanks to a refactoring of the hazard model computations to the log scale and extended parallelisation of the likelihood that is optimised for the structure of the input data. We have also extended the package documentation and streamlined the contribution process.

As a large-scale project, the package remains in an experimental state, though it is sufficiently stable for both research and production usage. More core development is needed to improve post-processing, pre-processing, documentation coverage, and evaluate optimal configurations in different settings) please see our community site, contributing guide, and list of issues/proposed features if interested in being involved (any scale of contribution is warmly welcomed including user feedback, requests to extend our functionality to cover your setting, and evaluating the package for your context). This is a community project that needs support from its users in order to provide improved tools for real-time infectious disease surveillance.

We thank @adrian-lison, @choi-hannah, @sbfnk, @Bisaloo, @seabbs, @pearsonca, and @pratikunterwegs for code contributions to this release. We also thank all community members for their contributions including @jhellewell14, @FelixGuenther, @parksw3, and @jbracher.

Full details on the changes in this release can be found in the following sections.

Package

Added .Rhistory to the .gitignore file. See #132 by @choi-hannah.
Fixed indentations for authors and contributors in the DESCRIPTION file. See #132 by @choi-hannah.
Renamed enw_new_reports() to enw_cumulative_to_incidence() and added the reverse function enw_incidence_to_cumulative() both functions use a by argument to allow specification of variable groupings. See #157 by @seabbs.
Switched class checking to inherits(x, "class") rather than class(x) %in% "class". See #155 by @Bisaloo.
Changed enw_add_metaobs_features() interface to have holidays argument as a series of dates. Changed interface of enw_preprocess_data() to pass ... to enw_add_metaobs_features(). Interface changes come with internal rewrite and unit tests. As part of internal rewrite, introduces coerce_date() to R/utils.R, which wraps data.table::as.IDate() with error handling. See #151 by @pearsonca.
Changed the style of using match.arg for validating inputs. Briefly, the preference is now to define options via function arguments and validate with automatic match.arg idiom with corresponding enumerated documentation of the options. For this idiom, the first item in the definition is the default. This approach only applies to string-based arguments; different types of arguments cannot be matched this way, nor can arguments that allow for vector-valued options (e.g., if somearg = c("option1", "option2") were a legal argument indicating to use both options). See #162 by @pearsonca addressing issue #156 by @Bisaloo.
Refined the use of data ordering throughout the preprocessing functions. See #147 by @seabbs.
Skipped tests that use cmdstan locally to improve the developer/contributor experience. See #147 by @seabbs and @adrian-lison.
Added a basic simulator function for missing reference data. See #147 by @seabbs and @adrian-lison.
Added support for right hand side interactions as syntax sugar for random effects. This allows the specification of, for example, independent random effects by day for each strata of another variable. See #169 by @seabbs.
Added support for passing cpp_options to cmdstanr::cmdstan_model(). See #182 by @seabbs.
Add a function, convolution_matrix() for constructing convolution matrices. See #183 by @seabbs.
Add a pass through from enw_model() to write_stan_files_no_profile() for the target_dir argument. This allows users to compile the model once and then share the compiled model across sessions rather than having to recompile each time the temporary directory is cleared. See #185 by @seabbs.
Added add_pmfs(), to sum probability mass functions into a new probability mass function. Initial implementation by @seabbs in #183, refactored by @pratikunterwegs in #187, following a suggestion in issue #186 by @pearsonca.
Added a warning when the observed empirical maximum delay is less than the specified maximum delay. See #190 by @seabbs.
Added nested support for converting array syntax in convert_cmdstan_to_rstan. See #192 by @sbfnk.

Model

Added support for parametric log-logistic delay distributions. See #128 by @adrian-lison.
Implemented direct specification of parametric baseline hazards. See #134 by @adrian-lison.
Refactored the observation model, the combination of logit hazards, and the effects priors to be contained in generic functions to make extending package functionality easier. See #137 by @seabbs.
Implemented specification of the parametric baseline hazards and probabilities on the log scale to increase robustness and efficiency. Also includes refactoring of these functions and reorganisation of inst/stan/epinowcast.stan to increase modularity and clarity. See #140 by @seabbs.
Introduced two new delay likelihoods delay_snap_lmpf and delay_group_lmpf. These stratify by either snapshots or groups. This is helpful for some models (such as the missingness module). The ability to choose which function is used has been exposed to the user in enw_fit_opts() via the likelihood_aggregation argument. Both of these functions rely on a newly added expected_obs_from_snaps function which vectorises expected_obs_from_index. See #138 by @seabbs and @adrian-lison.
Added support for supplying missingness model parameters to the model as well as optional priors and effect estimation. See #138 by @seabbs and @adrian-lison.
Refactored model generated quantities to be functional. See #138 by @seabbs and @adrian-lison.
Added support for modelling missing reference dates to the likelihood. See #147 by @seabbs and @adrian-lison.
Added additional functionality to delay_group_lmpf to support modelling observations missing reference dates. Also updated the generated quantities to support this mode. See #147 by @seabbs and @adrian-lison based on #64 by @adrian-lison.
Added a flexible expectation process on the growth rate scale. The default expectation model has been updated to a group-wise random walk on the growth rate. See #152 by @seabbs and @adrian-lison.
Added a deterministic renewal equation, and latent reporting process. See #152 and #183 by @seabbs and @adrian-lison.
Added support for no intercept in the expectation model and more general formula support to enable this as a feature in other modules going forward. See #170 by @seabbs.

Documentation

Removed explicit links to authors and issues in the NEWS.md file. See #132 by @choi-hannah.
Added a new example using simulated data and the enw_missing() model module. See #138 by @seabbs and @adrian-lison.
Update the model definition vignette to include the missing reference date model. See #147 by @seabbs and @adrian-lison.
Added the use of an expectation model to the “Hierarchical nowcasting of age stratified COVID-19 hospitalisations in Germany” vignette. See #193 by @seabbs.

Bugs

The probability-only model (i.e only a parametric distribution is used and hence the hazard scale is not needed) was not used due to a mistake specifying ref_as_p in the stan code. There was an additional issue in that the enw_report() module currently self-declares as on regardless of it is or not. This bug had no impact on results but would have increased runtimes for simple models. Both of these issues were fixed in #142 by @seabbs.
The addition of meta features week and month did not properly sequentially number weeks and months when time series crossed year boundaries. This would impact models that included effects expecting those to in fact be sequentially numbered (e.g. random walks). Fixed in #151 by @pearsonca.
#151 also corrects a minor issue with enw_example() pointing at an old file name when type="script". By @pearsonca.

epinowcast 0.1.0

This is a major release focusing on improving the user experience, and preparing for future package extensions, with an increase in modularity, development of a flexible and full-featured formula interface, and hopefully future-proofing as far as possible. This prepares the ground for future model extensions which will allow a broad range of real-time infectious disease questions to be better answered. These extensions include:

Modelling missing data (#43).
Non-parametric modelling of delay and reference date logit hazard (#4).
Flexible expectation modelling (#5).
Forecasting beyond the horizon of the data (#3).
Known reporting structures (#33).
Renewal equation-based reproduction number estimation (potentially part of #5).
Latent infections (i.e as implemented in other packages such as EpiNow2, epidemia, etc.).
Convolution-based delay models (i.e hospitalisations and deaths) with partially reported data.
Additional observation models.

If interested in contributing to these features, or other aspects of package development (for example improving post-processing, the coverage of documentation, or contributing case studies) please see our contributing guide and/or just reach out. This is a community project that needs support from its users in order to provide improved tools for real-time infectious disease surveillance.

This release contains multiple breaking changes. If needing the old interface please install 0.0.7 from GitHub. For ease, we have stratified changes below into interface, package, documentation, and model changes. Note the package is still flagged as experimental but is in regular use by the authors.

@adrian-lison, @sbfnk, and @seabbs contributed to this release.

Interface

A fully featured and flexible formula interface has been added that allows the specification of fixed effects, lme4 random effects, and random walks. See #27 by @seabbs.
A major overhaul, as described in #57, to the interface of epinowcast() with a particular focus on improving the modularity of the model components (described as modules in the documentation). All of the package documentation and vignettes have been updated to reflect this new interface. See #112 by @seabbs.

Package

Renamed the package and updated the description to give more clarity about the problem space it focusses on. See #110 by @seabbs.
A new helper function enw_delay_metadata() has been added. This produces metadata about the delay distribution vector that may be helpful in future modelling. This prepares the way for #4 where this data.frame will be combined with the reference metadata in order to build non-parametric hazard reference and delay-based models. In addition to adding this function, it has also been added to the output of enw_preprocess_data() in order to make the metadata readily available to end-users. See #80 by @seabbs.
Two new helper functions enw_filter_reference_dates() and enw_filter_report_dates() have been added. These replace enw_retrospective_data() but allow users to similarly construct retrospective data. Splitting these functions out into components also allows for additional use cases that were not previously possible. Note that by definition it is assumed that a report date for a given reference date must be equal or greater (i.e a report cannot happen before the event being reported occurs). See #82 by @sbfnk and @seabbs.
The internal grouping variables have been refactored to reduce the chance of clashes with columns in the data.frames supplied by the user. There will also be an error thrown in case of a variable clash, making preprocessing safer. See #102 by @adrian-lison and @seabbs, which solves #99.
Support for preprocessing observations with missing reference dates has been added along with a new data object returned by enw_preprocess_data() that highlights this information to the user (alternatively can be accessed by users using enw_missing_reference()). In addition, these missing observations have been setup to be passed to stan in order to allow their use in modelling. This feature is in preparation of adding full support for missing observations (see #43). See #106 by @adrian-lison and @seabbs.
The discretised reporting probability function has been extended to handle delays beyond the maximum delay in three different ways: ignore, add to maximum, or normalize. The nowcasting model uses “normalise” though work on this is ongoing. See #113 by @adrian-lison and #121 by @seabbs.
Fixed an issue (#105) with cmdstan 2.30.0 where passing optimisation flags to stanc_options by default was causing a compilation error by not passing these flags by default. See #117 by @sbfnk and @seabbs.
Addition of regression/integration tests against example data for epinowcast() and enw_preprocess_data() with convergence checking for several example nowcasting models. Lower level tests for model tools and model modules have also been added. See #112 by @seabbs.

Model

Added support for parametric exponential delay distributions (note that this is comparable to an intercept-only non-parametric hazard model) and potentially no parametric delay (though this will currently throw an error due to the lack of appropriate non-parametric hazard). See #84 by @seabbs.
Added support for a Poisson observation model though it is recommended that most users make use of the default negative binomial model. See #120 by @seabbs.
Updated the expectation random walk model to use a more efficient cumulative_sum implementation suggested by @adrian-lison in #98. See #103 by @seabbs.
Aligned the implementation of the overdispersion prior with the prior choice recommendations from the stan wiki. See #111 by @adrian-lison.

Documentation

The model description has been updated to reflect the currently implemented model and to improve readability. The use of reference and report date nomenclature has also been standardised across the package. See #71 by @sbfnk and @seabbs.

Internals

Array declarations in the stan model have been updated. To maintain compatibility with expose_stan_fns() (which itself depends on rstan), additional functionality has been added to parse stan code in this function. See #74, #85, and #93 by @sbfnk and @seabbs.
Remove spurious warnings due to missing initial values for optional parameters. See #76 by @sbfnk and @seabbs.

epinowcast 0.0.7

Adds additional quality of life data processing so that the maximum number (max_confirm) of notifications is available in every row (for both cumulative and incidence notifications) and the cumulative and daily empirical proportion reported are calculated for the user during pre-processing (see #62 by @seabbs).
The default approach to handling reported notifications beyond the maximum delay has been changed. In 0.0.6 and previous versions notifications beyond the maximum delay were silently dropped. In 0.0.7 this is now optional behaviour (set using max_delay_strat in enw_preprocess_data()) and the default is instead to add these notifications to the last included delay were present. This should produce more accurate long-term nowcasts when data is available but means that reported notifications for the maximum delay need to be interpreted with this in mind. See #62 by @seabbs.
Adds some basic testing and documentation for preprocessing functions. See #62 by @seabbs.
Stabilises calculation of expected observations by increasing the proportion of the calculation performed on the log scale. This results in reduced computation time with the majority of this coming from switching to using the neg_binomial_2_log family of functions (over their natural scale counterparts). See #65 by @seabbs

epinowcast 0.0.6

Simplifies and optimises the internal functions used to estimate the parametric daily reporting probability. These are now exposed to the user via the distribution parameter with both the Lognormal and Gamma families being tested to work. Note that both parameterisations use their standard parameterisations as given in the stan manual (see #42 by @adrian-lison and @seabbs)
Add profiling switch to model compilation, allowing to toggle profiling (https://mc-stan.org/cmdstanr/articles/profiling.html) on/off in the same model. Also supports .stan files found in include_paths (see #41 and #54 by @adrian-lison).
Fully vectorise the likelihood by flattening observations and pre-specify expected observations into a vector before calculating the log-likelihood (see #40 by @seabbs).
Adds vectorisation of zero truncated normal distributions (see #38 by @seabbs)
hazard_to_prob has been optimised using vectorisation (see #53 by @adrian-lison and @seabbs).
prob_to_hazard has been optimised so that only required cumulative probabilities are calculated (see #53 by @adrian-lison and @seabbs).
Updated to use the inv_sqrt stan function (see #60 by @seabbs).
Added support for scoringutils 1.0.0 (see #61 by @seabbs).
Added a basic example helper function, enw_example(), to power examples and tests based on work done in forecast.vocs (see #61 by @seabbs).

epinowcast 0.0.5

Convert retrospective data date fields to class of IDate when utilising enw_retrospective_data to solve esoteric error.
Added full argument name for include_paths to avoid console chatter
Adds a stanc_options argument to enw_model() and specifies a new default of list("01") which enables simple pre-compilation optimisations. See here of these optimisation for details.
Remove inv_logit and logit as may instead use base R plogit and qlogit.

epinowcast 0.0.4

Add support for extracting and summarising posterior nowcast samples
Package spell check
Update read me quick start to use 40 days of delay vs 30
Add a section to the read me quick start showing an example of handling nowcast samples.
Add support for passing custom models and included files to enw_model().
Fix a bug where enw_summarise_samples() returned duplicate samples.
Add support for passing holidays as a variable and then adjusting by converting the holiday day into a custom day of the week (by default Sunday but this is set by the user).
Added support for scoring on both the natural and log scale. This represents absolute and relative scoring respectively.

epinowcast 0.0.3

Add support for passing in priors
Add case study vignette
Add model definition and implementation details.
Add support for out of sample scoring (using scoringutils).

epinowcast 0.0.2

Initial version of the package with broadly working functionality and first draft vignettes.

epinowcast 0.0.1

Initial package version with development code