Choosing an Appropriate Time Step

General Considerations

Generally, a timestep should be as large as possible while meeting model verification criteria and producing all desired outputs. Larger timesteps are more desirable because they decrease computational time and resources required to run the model.

Some modeling features that may impact timestep duration decisions include:

  • Rate-based choices

    • See below.

  • Modeled events between specified time intervals. For example, a monthly timestep may be desired if simulants attend monthly medication sessions

    • This is because Vivarium simulants can only undergo one transition in each cause/intervention model per timestep. So if the timestep is very long, it’s unrealistic that only one thing happened to them during that long timestep, because in all likelihood something else would have happened to them in between that first event and the end of the timestep.

  • Duration-based choices: if the average duration of a given condition is 10 days, a timestep longer than 10 days may not be appropriate

    • This is because Vivarium person-time observers always work on the assumption that someone was in the same state for the entire timestep duration. If people in real life would be more quickly moving between states than the timestep allows, this assumption becomes highly inaccurate and V&V criteria may not be met.

Todo

Add more detail to the explanation behind the second two bullets and brainstorm potential solutions to work around these constraints if necessary.

Relationship between timesteps and modeled rates in Vivarium

Vivarium relies on the following approximation where \(r\) is some rate and \(dt\) is the simulation time step duration:

\[1 - e^{-r \times dt} \approx r \times dt\]

Note

What’s going on is that we’re thinking of rate \(r\) as an exponentially distributed continuous random variable. But in Vivarium this random variable gets discretized into a geometric random variable, I believe with parameter \(p = 1 - e^{-r \times dt}\). The mean of the exponential random variable is \(r\), whereas the mean of the geometric random variable, converted from time steps back to days, is \(dt/p\).

Todo

Add more detail on why this is the case from an implementation stand point?

However, when \(r >> dt\), this approximation becomes less accurate. When this approximation does not hold (when a given rate is much larger than the simulation timestep), the rates from Vivarium simulation outputs will not accurately reflect the desired rates in the Vivarium simulation inputs (model verification will not be successful).

Notably, while this approximation may hold at the population level, it is important to remember that rates are heterogeneous at the individual level in Vivarium simulations, so it may not validate for particular subgroups with higher outcome rates due to their high-risk exposures, which may still cause model verification to be unsuccessful.

Theoretically, in the case of modeling high rates, a small enough timestep (\(dt\)) may be selected to achieve verification criteria. However, this may not be desireable for as smaller timesteps will lead to longer run times and more demands for computational resources. Therefore, the following solutions may be considered:

Rate adjustment

Artificially inflate rate \(r\) using the following equation:

\[r' = (-1/dt) \times ln(1 - dt \times r)\]

So then:

\[1 - e^{-r' \times dt} \approx r \times dt\]

Therefore, when \(r'\) is used as an input value in Vivarium, \(r\) will be output.

This strategy may be considered when there is a single parameter in a simulation that results in the violation of the approximation and the timestep is otherwise well-suited for the model, such as the remission rate of diarrheal diseases.

Cause exclusions

Mortality rates may vary dramatically by age, particularly when modeling children and the elderly. Additionally, excess mortality rates are definitionally higher than cause-specific mortality rates. Therefore, modeling strategies that avoid modeling excess mortality among these high-mortality age groups and model cause-specific mortality instead may be desireable.

An example of this strategy was utilized by the child IV iron simulation, in which diarrheal diseases and lower respiratory infection causes among the neonatal age groups were included by their CSMR only (affected by the LBWSG risk factor) rather than an incidence/remission model that included EMRs.

Asynchronous models

Sometimes, the difference in rates across age groups may be so great that it may be desireable to model them asynchronously. This strategy was used in the IV iron simulation in which women of reproductive age were modeled separately than children, with a longer timestep among the adults than the children. This strategy will require tracking of output data from one of the models at the individual level to be used as inputs to the other.