Risk Correlation¶
Background and motivation¶
At the individual level, risk exposures are frequently correlated. Examples include high body mass index and high fasting plasma glucose, tobacco smoking and alcohol use, and childhood height and weight.
Usually, Vivarium assigns each risk exposure independently to simulants, such that each one follows the desired population-level univariate distribution, which frequently comes from GBD. In the common case of a dichotomous exposure, this means that each simulant (in the same age/sex/location group) has the same probability of exposure, which is equal to the prevalence in that population.
The correlation of risk exposures can be important to simulation results, especially when the risks affect the same outcome, when results are stratified by one of the risks, and/or when interventions are targeted based on one of the risks.
Todo
Add a more detailed description of when to include risk correlation, including which results are biased by the omission of a risk correlation and how to assess size/acceptability of the bias.
When modeling risk correlation may be important¶
There are certain situations in which correlation between modeled risk factors is important to include in our simulations to avoid biased results. Assessing whether failure to consider correlation between modeled risk factors will result in bias depends on the independent (most often intervention coverage) and dependent variables (most often DALYs averted) in the research question at hand and how the correlated risks relate to those variables. Further, in addition to assessing whether failing to consider the correlation between modeled risk factors will cause biased model results, it is also important to consider the expected magnitude of that bias, as small biases may be acceptable model limitations.
There are three common situations in which modeling correlation between two risk factors may be important in our simulations, including:
1. When there is correlation between two factors that each influence eligibility for the intervention. In this situation, modeling the correlation between these risks will influence the proportion of the population who is eligible to receive the intervention.
2. When a risk that influences intervention eligibility is correlated with a risk affected by the intervention. In this situation, modeling the correlation between these risks will influence the risk profile of the population who receives the intervention, thereby influencing the potential impact of the intervention (given that the intervention will presumably have a greater impact when delivered to a higher-risk population).
3. When there is correlation between two risks that affect the same outcome. In this situation, the correlation between these risks will influence the PAF calculation of these risks on the outcome, as discussed later on this page. If an intervention affects R1 and/or R2 in this situation, the estimation of the impact of the intervention on the outcome will also be affected.
The Jupyter notebook linked here contains more details on each of these situations and provides functions to estimate the potential magnitude of bias associated with failing to consider the correlation between risk factors may have that can be applied/adapted to specific research questions in order to aid in determining whether to model risk-risk correlation in a vivarium simulation.
Todo
Link to future page on residual confounding?
Risk exposure correlation¶
Instead of sampling from a univariate distribution for each risk exposure at initialization, Vivarium can sample multiple exposures simultaneously from a multivariate distribution. For example, given a correlation coefficient between two continuous exposures, it can sample from a bivariate normal distribution instead of two independent univariate normal distributions. Or, it can sample one of the exposures first and then sample a second categorical exposure using probabilities conditional on the first exposure (as in this example from the IV iron simulation).
We generally cannot get multivariate distributions from the GBD; instead, we seek out auxiliary data sources. These may report summary statistics such as a correlation coefficient, or have full microdata we can use to summarize the joint distribution ourselves. No matter how we characterize the relationship, we prefer to continue to validate to the GBD risk exposure distributions, since we trust them more than any single data source. One way to do this is to extract (a summary of) the joint distribution’s copula from the auxiliary data source and combine that copula with the univariate distributions from GBD – more on this in the next section.
This plot shows data from a joint distribution reflecting the variance and covariance between height and weight for infants at age one month using data from a cohort study. The top panel shows the values for the height and weight z-scores and the bottom panel shows the quantile ranks (propensity values) within the distributions – i.e., the bottom panel shows the copula of the joint distribution.
Maintaining correlation over time¶
Maintaining the correlation is tricky when the risk exposures for a single simulant change over time. In general, we have done this by sampling correlated quantile ranks at initialization and then keeping those quantile ranks static throughout a simulant’s life. These quantile ranks, also known as propensities, are applied to different distributions as the simulant changes age group, which means that a simulant’s risk exposure in one age group and the next are perfectly auto-correlated.
For example, a simulant might have their propensities for height and weight sampled at birth from a bivariate distribution in which the two components are correlated and are each uniformly distributed (i.e., a bivariate copula). If the simulant receives a propensity of 0.85 (85th percentile rank) in height and 0.80 (80th percentile rank) in weight, they are initially assigned the height and weight values at those percentiles among newborns. But when they age into the next age group, they are assigned the values at those same percentiles in their new age group, and so on throughout their life. In this way, their height and weight can change over time, but the correlation introduced in the initial sampling of propensities remains.