Alzheimer’s Population Model with Demographic Forecasts
Abbreviation |
Definition |
|---|---|
AD |
Alzheimer’s Disease |
BBBM |
Blood-Based Biomarker |
GBD |
Global Burden of Disease |
FHS |
Future Health Scenarios |
MCI |
Mild Cognitive Impairment |
Overview
The goal of this population model is to model only simulants with Alzheimer’s disease (and other dementias), in order to reduce the necessary population size for the CSU Alzheiemer’s simulation. The model document is split into two parts: 1) initializing the population, and 2) adding new simulants during the simulated timeframe. We will also describe two different versions of the population model, corresponding to progressive model versions of the CSU Alzheiemer’s simulation:
Models 2 and 3: Modeling simulants with Alzheimer’s disease (AD) and other dementias as defined by GBD
Model 4 and above: Modeling simulants with presymptomatic AD, MCI, or AD dementia
Initializing the Population
We will first describe how to initialize the population for AD and other dementias as defined by GBD, then we will explain how to modify the initialization strategy when including the presymptomatic and MCI stages of AD.
Model Scale
Let \(t_0\) be the starting time of our simulation, let \(X_{t_0}\) be the size of our simulated population at initialization (i.e., the initial population size per draw specified in the concept model), and let \(X^\text{real}_{t_0}\) be the corresponding real-world population at time \(t_0\) that our simulation is supposed to represent. The model scale, \(S\), of our simulation is defined to be \(S = X_{t_0} / X^\text{real}_{t_0}\). We will use the model scale both for initializing our simulated population and for adding new simulants.
In our case, \(X^\text{real}_{t_0}\) is the population of people with Alzheimer’s disease and other dementias at time \(t_0\) in a particular country. We can compute this as
where \(Y^\text{real}_{t_0}\) is the total population at time \(t_0\) in our simulated location according to GBD, and \(p_\text{AD}\) is the prevalence of Alzheimer’s disease and other dementias across all age groups and sexes in that location. Note that the model scale can also be computed as \(S = Y_{t_0} / Y^\text{real}_{t_0}\), where \(Y_{t_0} = X_{t_0} / p_\text{AD}\) is the size of an imagined total model population including all people with and without Alzheiemer’s disease, of which those with Alzheimer’s are the ones who appear in our simulation. Putting everything together,
which computes the model scale in terms of known parameters.
Initializing Demographic Subgroups
Let \(g\) denote a demographic subgroup of the population in a given location, namely \(g = (\text{age group, sex})\). For each demographic group \(g\) and time \(t\), we generalize the notation in the previous section to define the following populations in demographic group \(g\) at time \(t\):
\(X_{g,t}\) = the number of simulants in group \(g\) at time \(t\)
\(X^\text{real}_{g,t}\) = the real population corresponding to our simulated population \(X_{g,t}\)
\(Y_{g,t}\) = the imagined total model population in group \(g\), including people with and without AD, of which \(X_{g,t}\) counts the subset with AD
\(Y^\text{real}_{g,t}\) = the total real population in group \(g\) at time \(t\) according to GBD
We need to determine \(X_{g,t_0}\) (the initial simulated population) for each demographic group \(g\). Let \(p_{g,t}\) be the prevalence of Alzheimer’s disease and other dementias in demographic group \(g\) at time \(t,\) for a given location. Two relations among the above quantities are:
(For \(t\ne t_0\), the first relation assumes that our simulated population accurately tracks the real-world population over time.) Therefore, at time \(t_0\),
where the final equality follows from plugging in formula (1) for the model scale \(S\). This equation tells us how many simulants to initialize into each demographic group based on known parameters.
Note
Another way to write (2) is
Thus, we could compute \(X_{g,t_0}\) using prevalence counts from GBD instead of prevalence rates.
To verify that (2) gives us the correct total number of initial simulants, note that
Todo
Add a note about how the initial values in each subgroup are related to the “population structure” of the simulation.
Initializing simulants with presymptomatic and MCI stages
Starting in Model 4 of the CSU Alzheimer’s simulation, the Alzheimer’s cause model includes two pre-dementia stages, BBBM-AD, and MCI-AD, in addition to the dementia stage AD-dementia. When computing the model scale and initializing demographic subgroups, \(p_\text{AD}\) should be replaced by \(p_\text{(all AD states)}\), the combined prevalence of the three states BBBM-AD, MCI-AD, and AD-dementia, across all demographic groups at time \(t_0\). Similarly, \(p_{g,t}\) should now refer to the combined prevalence of all three AD stages in demographic group \(g\) at time \(t\). The value of \(p_{g,t}\) is defined on the Alzheimer’s cause model page. With these updated definitions, the model scale and initial population size in each group are defined the same as above:
Adding New Simulants
Let \(N_{g,t}\) denote the number of new simulants in demographic group \(g\) that we want to add to the simulation at time \(t\). We will assume that \(N_{g,t}\) is a Poisson random variable with mean \(\lambda_{g,t} \cdot \Delta t \cdot 1_{\{\text{simulation step times}\}}(t)\), where \(\lambda_{g,t}\) is the entrance rate of new simulants (measured in count of simulants per unit time) at time \(t\), \(\Delta t\) is the length of a simulation time step, and \(1_A\) is the indicator function of the set \(A\) (the indicator function zeros out the entrance rate at times when the simulation is not taking a step). Our goal is to determine the entrance rate \(\lambda_{g,t}\) for each \(g\) and \(t\).
Calculating entrance rate when simulating AD-dementia only
First we describe how to calculate the entrance rate in the case where we are modeling only simulants with AD-dementia (i.e., we are not modeling the presymptomatic or MCI statges). Let \(A_g(t)\) be the cumulative number of incident cases of AD by time \(t\) in demographic group \(g\) in the real population. Since our simulation is scaled down by a factor of \(S\), the rate at which we want to add simulants is
where \(\dot A_g(t)\) is the derivative of \(A_g(t)\) with respect to \(t\). To calculate \(\lambda_{g,t}\), we rewrite it in terms of quantities that we can estimate from the available data:
where \(i_{g,t}^\text{AD} = \dot A_g(t) /Y^\text{real}_{g,t}\) is the total population incidence hazard of AD in demographic group \(g\) at time \(t\). We know the model scale \(S\) from (1) above, and we can estimate the quantities \(i_{g,t}^\text{AD}\) and \(Y^\text{real}_{g,t}\) from GBD as follows.
Let \(y(t)\) denote the year to which time \(t\) belongs. If we assume that the hazard \(i_{g,t}^\text{AD}\) is constant throughout the year \(y(t)\), then it is equal to its person-time-average over the year, which is the total population incidence rate:
This is the raw AD incidence rate we pull from GBD (not the susceptible population incidence rate usually calculated by Vivarium Inputs). If we assume that the population \(Y^\text{real}_{g,t}\) is constant throughout the year \(y(t)\), then it is equal to its time-average over the year:
This is the population we pull from GBD using get_population. Thus, (3) expresses the entrance rate \(\lambda_{g,t}\) in terms of quantities we can estimate from data.
Note
Based on plots of AD incidence from GBD Compare, we will make the simplifying assumption that for each demographic group \(g\), the Alzheimer’s incidence rate \(i_{g,t}^\text{AD}\) does not change over time. Thus, we will use GBD 2021 data and assume that \(i_{g,t}^\text{AD}\) equals the AD incidence rate in 2021 from for all times \(t\).
For Model 2 of the Alzheimer’s simulation, we will use GBD 2021 data and assume that the total population \(Y^\text{real}_{g,t}\) equals the average population in 2021 for all times \(t\). For Models 3 and higher, we will use forecasted data from FHS to estimate \(Y^\text{real}_{g,t}\) as the average population in year \(y(t)\) for years 2025 through 2050, then assume the total population remains constant thereafter.
Alternative view using incidence count
The most direct way to estimate \(\dot A_g(t)\) is to assume it is constant, in which case it equals its time-average. For example, if \(y(t)\) denotes the year to which time \(t\) belongs, and we assume \(\dot A_g(t)\) is constant during the year \(y(t)\), then
This ends up being equivalent to the method using incidence rates above, but whereas the count of incident cases is likely to vary considerably due to changing demographics, the incidence rate of AD is likely to remain fairly stable over time. Thus, using using the incidence rate and the total population is a more appropriate way to use the available data.
Calculating entrance rate with presymptomatic and MCI stages
Let \(B_g(t)\) be the cumulative number of incident cases of BBBM-presymptomatic AD by time \(t\) in demographic group \(g\) in the real population. When including the presymptomatic and MCI stages of AD, instead of defining \(\lambda_{g,t}\) in terms of \(\dot A_g(t)\), the rate at which we want to add simulants is now
where \(S\) is the model scale and \(\dot B_g(t)\) is the derivative of \(B_g(t)\) with respect to \(t\). We can decompose \(B_g(t)\) into two components:
where, at time \(t\),
\(B_{g,t}^\text{AD}\) = the cumulative number of incident cases of BBBM-AD in group \(g\) that will eventually progress to AD-dementia,
\(B_{g,t}^\text{die}\) = the cumulative number of incident cases of BBBM-AD in group \(g\) that will die before they progress to AD-dementia.
Note that \(B_g^\text{AD}\) and \(B_g^\text{die}\) are defined in terms of future events with respect to the time \(t\), but that’s fine.
We will estimate \(\dot B_g(t) = \dot B_{g,t}^\text{AD} + \dot B_{g,t}^\text{die}\) by making the simplifying assumption that everyone’s duration of pre-dementia AD is exactly equal to the average duration of BBBM-AD plus MCI-AD. This will simplify our calculations and will hopefully give a good enough approximation to closely match the values of \(\dot A_g(t)\) calculated as above.
Estimating BBBM cases that progress to AD-dementia
Let \(\Delta = \Delta_\text{BBBM} + \Delta_\text{MCI}\) be the total average duration of pre-dementia AD, and let \(w\) be the width of an age group (i.e., 5 years for GBD age groups). There exists a unique integer \(n\) and real number \(r\) with \(0\le r < w\) such that
For example, if \(\Delta = 7\) years and \(w=5\) years , then \(n = 1\) and \(r = 2\) years.
Todo
In model 4.2 we updated the disease state durations so that \(\Delta\) is now about 10.2 years instead of 7 years, so it would be good to update these example numbers using the new value. In a future model version, we may further update these durations to take mortality into account, making them age-dependent. This might require additional changes to how we describe things here.
Under our simplifying assumption, everyone who enters the count \(B_{g,t}^\text{AD}\) at time \(t\) will transition to AD-dementia at time \(t + \Delta\). Assuming that ages are uniformly distributed within the group \(g\), and working backwards from our calculation of \(\dot A_g(t)\) above, the rate at which the count \(B_{g,t}^\text{AD}\) is increasing should be
For example, if we write \(g = (F,\,70)\) for females aged 70–74, \(g + 5 = (F,\,75)\) for females aged 75–79, etc., the rate of increase in 2025 of the number of females aged 70–74 who are entering the BBBM-AD state and will enter the AD-dementia state \(\Delta\) years later is calculated as
Note that we are assuming that the incidence rate \(i_{g,t}^\text{AD}\) of AD-dementia does not depend on the time \(t\).
Important
Recall that \(i_{g,t}^\text{AD}\) is the total-population incidence rate of AD-dementia, with total population in the denominator instead of susceptible population.
Attention
The last age group we model is 95–100, and for the oldest age groups \(g\), there will be no data for the age groups \(g + nw\) or \(g + (n+1)w\) to plug into the formula for \(\dot B_{g,t}^\text{AD}\). In this case, set \(Y^\text{real}_{g + nw,\, t+\Delta}\) and/or \(Y^\text{real}_{g + (n+1)w,\, t+\Delta}\) to zero, because we don’t expect people to live long enough to transition into AD-dementia in these age groups. The value of \(i_{g+nw,t}^\text{AD}\) and/or \(i_{g+(n+1)w,t}^\text{AD}\) can be filled in with any finite value in this case since it will be getting multiplied by zero.
Estimating BBBM cases that die during pre-dementia AD
In order to get the correct number of people transitioning into the AD-dementia state at time \(t+\Delta\), we need to account for people who will die during the BBBM-AD and MCI-AD stages. That is, we need to estimate \(\dot B_{g,t}^\text{die}\). To do this, let \(\gamma_{g,t}\) be the probability that a person in group \(g\) who enters the BBBM-AD state at time \(t\) dies before they reach the AD-dementia state. Then for a large population,
Then we can solve for \(\dot B_g(t)\) to get
Thus, instead of estimating \(\dot B_{g,t}^\text{die}\) directly, we can estimate the probability \(\gamma_{g,t}\) and combine it with our calculation of \(\dot B_{g,t}^\text{AD}\) above to directly estimate the total rate \(\dot B_g(t)\) at which people are entering the BBBM-AD state. To finish the calculation, we need to estimate \(\gamma_{g,t}\).
To estimate the mortality probability \(\gamma_{g,t}\), let \(m_{g,t}\) denote the background mortality hazard for people in group \(g\) at time \(t\). This is the mortality hazard experienced by people in the BBBM and MCI states and is equal to the all-cause mortality rate minus the cause-specific mortality rate for AD-dementia. Using our assumption that the duration of pre-dementia AD is exactly \(\Delta\), plus the definition of mortality hazard,
where \(\bar m_{g,t} = \frac{1}{\Delta} \int_0^\Delta m_{g + \tau,\,t+\tau}\, d\tau\) is the time-average of the mortality hazard over the interval \([t, t+\Delta]\).
Aside
Question: If we want to replace \(m_{g,t}\) with a constant average hazard, why is it that here we use a time-average, whereas in other situations (incidence rates, mortality rates) we use a person-time average? Because hazard x time = probability, whereas hazard x person-time = count of people. In this case we’re computing a probability, not a count of people.
To make things simple, we can estimate the average mortality hazard \(\bar m_{g,t}\) as the mortality rate at the midpoint of the time interval, \(t + \Delta / 2\). That is,
Continuing the example from above, the probability of death among females aged 70–74 who enter the BBBM-AD state in 2025 is approximately
Note that since we have estimates of mortality rates for 5-year age groups and single years, we have rounded to the nearest age group and year. For example \((F,70) + 3.5\) represents females aged 73.5–78.5 (i.e., \([73.5, 78.5)\)), so we round to the nearest age group of 75–79 (i.e., \([75, 80)\)). With \(\Delta = 7\) years and \(w = 5\) years, \(g+ \Delta/2\) should always get rounded to \(g + 5\). For the year, 2025 + 3.5 = 28.5, and, since this is right at the year’s midpoint, I’ve arbitrarily rounded up instead of down.
Attention
If \(g+ \Delta/2\) has an age range beyond the oldest age group of 95–100, use the corresponding mortality rate for the 95–100 age group, since we expect mortality rates to stay at least this high for older ages.
Note
We can get a better approximation of \(\gamma_{g,t}\) by making a more careful approximation of the integral in the exponent (see this GitHub comment for an explanation):
Note that the weights add up to \(\Delta\), showing that this is is a refinement of the approximation \(\Delta \cdot m_{g + \frac{\Delta}{2} ,\, t + \frac{\Delta}{2}}\).
With \(\Delta = 7\) years and \(w = 5\) years, we have \(n = 1\), so there are only three terms in the sum, corresponding to \(g\), \(g + 5\), and \(g + 10\), and the coefficients for these three terms are the three special cases in the above sum (the generic coefficient of \(w\) never appears). Using our running example,
Entrance rate into the BBBM state in the simulation
As noted above, the rate at which real-world people in demographic group \(g\) are entering the BBBM-AD state at time \(t\) is approximately \(\dot B_{g,t}^\text{AD} \cdot \frac{1}{1 - \gamma_{g,t}}\). Multiplying by the model scale \(S\), the rate at which we want to add simulants into the BBBM-AD state is then
If \(t\) is a step time of the simulation, the average number of simulants we want to add at time \(t\) is then
where \(\Delta t\) is the step size of the simulation (currently defined as 183 days). Finally, the number of simulants actually added at time \(t\) will be a Poisson-distributed random variable with mean \(\lambda_{g,t} \cdot \Delta t\).
Implementation and data tables
Todo
Write up more concrete, direct instructions for implementation, including:
Specification of exactly what data to use (data tables)
Reiterate equation for entrance rate, using notation consistent with cause model page
Make sure to spell out how the length of the time step is involved
Reiterate that we need to sample a Poisson count with the specified mean
Strategy for sampling continuous ages uniformly within age bins, including capping the oldest age bin (95+) at 100 when adding new simulants
Also, maybe this should go in another top-level section and include instructions for initialization as well, instead of being a subsection of the “adding new simulants” section.
Note that the engineers said that the number of simulants initialized into each age group at time \(t_0\) is also random, but I’m not sure exactly how it works (e.g., is the number of initial simulants in each group also a Poisson random variable?).
Data Tables
The following table shows the variables that come directly from our data sources. Other quantities needed for the simulation are defined above in terms of these values.
Variable |
Definition |
Source or value |
Notes |
|---|---|---|---|
\(X_{t_0}\) |
The initial size of our simulated population |
“Initial population size per draw” in the simulation parameter specifications table in the concept model |
Includes all demographic groups |
\(Y^\text{real}_{g,t}\) |
Total population of demographic group \(g\) at time \(t\) |
population_forecast in AD cause model data sources table |
From GBD 2021 Forecasting Capstone. Available for years 2021-2050. |
\(p_{g,t}\) |
The combined prevalence of all AD stages in demographic group \(g\) at time \(t\) |
\(p_\text{(All AD states)}\) in the Attention box on the AD cause model page |
Calculated from the GBD 2023 dementia envelope using the dementia subtype proportions provided by the dementia modelers. We will only need the value for the single year \(t_0\). |
\(i^\text{AD}_{g,t}\) |
Total-population incidence rate of AD dementia in demographic group \(g\) at time \(t\) |
incidence_AD in AD cause model data sources table |
Calculated from the GBD 2023 dementia envelope using the dementia subtype proportions provided by the dementia modelers. Assumed to be independent of \(t\). |
\(m_{g,t}\) |
Background mortality hazard in demographic group \(g\) at time \(t\) |
m_BBBM or m_MCI in the AD cause model data sources table |
Equal to all-cause mortality rate minus cause-specific mortality rate for AD-dementia. Uses all-cause mortality rate forecasts for 2021–2050 from GBD 2021 Forecasting Capstone. |
\(\Delta_\text{BBBM}\), \(\Delta_\text{MCI}\) |
Average duration of the BBBM-AD state or MCI-AD state, respectively |
\(\Delta_\text{BBBM}\) and \(\Delta_\text{MCI}\) in the AD cause model data sources table |
|
\(w\) |
The width of a standard GBD age bin |
5 years |
We are not modeling the youngest age goups, which have smaller age bins, and we are capping the 95+ age bin at 100, making it a standard 5-year age bin |