Understanding and Pulling GBD Data¶
Global Burden of Disease (GBD) Study data is a fundamental data source for our simulation models. Understanding what data is available in the GBD and what modeling processes produced it is a difficult task. Some helpful resources for understanding the GBD study are listed below:
IHME onboarding trainings
GBD capstone papers and their methods appendices, such as:
The GBD compare tool, which allows you to visualize GBD estimates
Your simulation science team members!
Talking to GBD modelers directly
Pulling GBD Data using Vivarium Inputs¶
There are two main packages within the Vivarium software framework that are especially useful for interacting with GBD data: gbd_mapping and vivarium_inputs.
Both of these packages translate ID numbers used in GBD to human-readable text.
Overview of gbd_mapping
¶
gbd_mapping
provides a convenient way to access all of the metadata associated with a given GBD entity (ex: diarrheal diseases cause or child growth failure risk factor), but does not return any estimates associated with that entity (ex: prevalence or relative risks).
Overview of vivarium_inputs
¶
vivarium_inputs
provides simplified functions to query GBD data and reformats the data to be compatible with the data structure required for building Vivarium Artifact objects. vivarium_inputs
generally returns data for the most up-to-date complete GBD round/release and does not allow for user-specification of prior rounds/releases – ask the software engineers if you have questions about which GBD round/release is active in vivarium_inputs
at any given time. Additionally, if there is any doubt as to which GBD versioning is being returned by a given vivarium_inputs
call, you can utilize get_raw_data
, which will return full data including GBD versioning IDs for a given call.
For documentation on Vivarium Inputs, click here.
Some important notes and considerations not included in the documentation above are listed below:
Todo
List default behavior of get_measures/other functions once the GBD 2021 update is finalized, including things like:
Returning most recent available year - note potential exception with risk effects?
Filtering of draws (reduction of 1,000 COD draws down to 500 that are present in COMO)?
Returning all ages/sexes and filling NANs with zeros
Version ID behavior with GBD 2021?
Anything else?
Measure |
Data returned |
Note |
---|---|---|
|
GBD_incidence / (1 - GBD_prevalence) |
By default, get_measures automatically converts GBD’s “population-level incidence rates” to “susceptible population incidence rates” using the GBD estimate of prevalence. Note that if a model is using an alternative value for prevalence, this rescaling should be done separately using that prevalence value. |
|
GBD_incidence |
|
|
GBD_death_count / GBD_population_counts |
|
|
cause_specific_mortality / GBD_prevalence |
By default, get_measures calculates excess mortality rates in accordance with the GBD estimate of prevalence. If a model is using an alternative value for cause prevalence, excess mortality rates should likely be calculated separately using that prevalence value. |
Applied examples¶
Todo
Link notebook that shows examples of using these functions.
Considerations of each approach¶
Generally, GBD shared functions offer greater flexibility in querying GBD data than Vivarium Inputs, but require specification of detailed IDs that are not human-readable and require translation with get_ids. Vivarium Inputs offers less flexibility in favor of the convenience of returning a human-readable version of the most relevant data for running Vivarium simulations and compatibility with required Vivarium Artifact formatting. Therefore, GBD shared functions may be the code base to use when taking deep dives into GBD data, and Vivarium Inputs when preparing GBD data for Vivarium simulations. Some additional specific considerations about the differences between the two options are summarized in the table below.
Topic |
GBD Shared Functions |
Vivarium Inputs |
---|---|---|
GBD round |
Able to specify any GBD round/release; useful for noting and comparing major changes between rounds |
Returns most recent complete GBD round/release only |
DALYs |
Returns YLD, YLL, DALY estimates |
Does not return YLD, YLL, or DALY estimates |
Metrics |
Returns counts, rates, and prevalence estimates |
Returns rate estimates with the exception of population structure, which are in counts; convenient |
Summary values |
Can return mean, upper, and lower estimates using get_outputs |
Returns draw-level estimates only |
Age/sex/location specificity |
Allows for specification across all these parameters, allows for grouping (via get_outputs) and/or aggregation (via make_custom_aggregates) across demographic categories |
Returns all most-detailed age and sex estimates. Supports only one location at a time. |
Format |
Generally uses ID numbers that are not human-readable before pairing with get_ids information |
Converts to human readable entity names rather than IDs and is compatible with formatting required for vivarium Artifacts and simulations |
Note
Generally, to convert between GBD shared function entity names (such as cause_name) to the entity name in Vivarium inputs, convert the GBD shared function entity name to all lower case and replace spaces with underscores. Python code to do this is shown below:
vivarium_inputs_entity_name = gbd_entity_name.lower().replace(' ', '_')
There are some exceptions to this code that will require additional conversion, which can be viewed in the vivarium inputs source code found in the clean_entity_list
method, found here.