Fit a custom model



To see if a CAZ has resulted in a reduction in harmful air pollution, we can't just look at the measured pollution before and after the CAZ was introduced, as NO2 concentrations are the result of complex physical relationships between both natural and man-made systems and as such vary considerably day-to-day. Notably, average NO2 levels vary throughout the year (peaking in Winter), and are further impacted by local meteorological conditions. As such, a model is used to detrend the raw measurements so that any changes are due to external factors, such as policy interventions. In particular, we require a statistical model that can handle time-series data with external covariates, can track underlying changes, and provides a full probabilistic formulation of every parameter.

State-space modelling

A State-space model (SSM), which is sometimes referred to as a Kalman-Filter, is a statistical model of a time-series that defines the observed measurements as a linear function of one or more unobserved states (observation equation), along with a fully specified model of the state's dynamics (state equation). The general forms of the SSM equations are as follows.

Observation equation: $$y_t = \alpha_t + \beta X_t + \epsilon_t$$ State equation: $$\alpha_t = \alpha_{t-1} + \eta_t$$ Where:
  • \(y_t\) = observed data of the outcome
  • \(\alpha_t\) = the underlying trend: part 1 of the state
  • \(\beta\) = regression coefficients for the external covariates: part 2 of the state
  • \(X_t\) = observed data of the external covariates
  • \(\epsilon_t\) = error unexplained by the model, assumed to follow a constant variance \(\Sigma_\epsilon\)
  • \(\eta_t\) = how much the trend can update each time-step on a random walk, assumed to follow a constant variance \(\Sigma_\eta\). NB: \(\beta\) can be included here so that the regression coefficients are time-varying, but this is not used in this model

Detrending NO2

In particular for the NO2 detrending, these parameters represent:
  • \(y_t\) = measured NO2
  • \(\alpha_t\) = underlying detrended NO2
  • \(X_t\) = meteorological measurements and temporal variables
  • \(\beta\) = coefficients for meteorological and temporal factors
  • \(\epsilon_t\) = error unexplained by the model
  • \(\eta_t\) = how much the NO2 detrended series can update each day (on a random walk)
\(\Sigma_\epsilon\) and \(\Sigma_\eta\) can be manually specified if they are known from a mechanistic knowledge of the dynamics, but in this case they are automatically estimated using Maximum Likelihood Estimation.
Furthermore, a log transform is applied to the outcome in order to help stabilize the variance (NO2 exhibits right skew), and to enforce positivity.

Estimating the CAZ's impact

To quantify the impact of the CAZ on reducing NO2 concentrations in a more formal manner than simply observing a decrease in the detrended series, an intervention variable is added to the \(X_t\) covariate matrix, equal to a 1 when the CAZ is in effect, and a 0 beforehand. The resulting associated coefficient in \(\beta\) quantifies the effect of the NO2, which is in relative/percentage terms since a log transform is used on the outcome. It is this value that is displayed as the purple time-series, along with its associated 95% confidence interval.

Daily update

Every day at 6am GMT, the model is updated with the previous day's average concentration, providing a new estimate of both the detrended series and the intervention effect. A SSM can be updated in two ways: filter or smoother. The filter only uses the most recent data available for each estimate, i.e. the detrend estimate for the 15th January only uses data that was available on the 15th January, even if it's now 30th January. A smoother by contrast uses all data, so on the 30th January rather than just updating the detrend estimate for the 29th, the smoother would create state estimates for every day. The smoother produces, well, smoother results, and is less susceptible to quick changes in trend. However it feels slightly misleading to portray it as an online method unlike the filter. In high resolution time-series the filter is preferred for online situations since it is far less computationally demanding (since it just updates one timepoint rather than all); since this is just daily data the smoother could be used here as there wouldn't be any time-limitations, but it feels more in keeping with the stated purpose of a 'real-time' dashboard to use the filter.

Modelling the effect of UK Clean Air Zones on reducing NO2 concentrations

This website contains live dashboards to monitor the impact of Clean Air Zones (CAZ) on NO2 concentrations in multiple cities in the UK. It has been developed by researchers at the Wolfson Atmospheric Chemistry Laboratories at the University of York as part of a research project into techniques for identifying local changes in NO2 emissions arising from policy changes. The dashboard was setup prior to the introduction of CAZs in two UK cities: Newcastle (30th January 2023) and Sheffield (27th February 2023), providing a real-time online estimate of the CAZ's effectiveness. This provides a more realistic estimate of how much information can be gleamed in real-time, rather than a post-hoc study once a significant duration has passed with the benefit of hindsight. However, it also means there is additional uncertainty in the estimates, both due to the limited information contained in the data, but also due teething issues being identified in the methodology itself. As such, the following disclaimer is provided on each dashboard as a reminder that this is a work-in-progress and the estimates should not be used as-is without thoroughly understanding the limitations of the approach.

Disclaimer: the estimates shown here are not validated and are still undergoing active research, as such they should not be treated as definitive and should be viewed with caution.

A modelling based approach is used to extract changes in the underlying NO2 concentrations, free of confounding factors such as the local meteorology and seasonal factors. The resulting detrended series is used to identify the changes since the intervention took place. See the Methodology tab for full details of the modelling.


Please send me an email stuart.lacy at york.ac.uk if you have any questions or would like to discuss this work.
The source code for both the state-space modelling and the Shiny web-app are publicly available on GitHub: https://github.com/wacl-york/ncaz