2005-2009 Version 1 U.S.-wide Synthetic Population database available

gisgeek · January 19, 2012, 7:11pm

Hello–

I thought the OpenABM community would be interested to know that there is now available a complete a U.S.-wide geospatially explicit synthetic population database that was created using the 2005-2009 American Community Survey 5-year sample. The data are freely available for download by state or county.

**Purpose:**This dataset was created as a generic, nationwide synthetic population infrastructure for the ABM developer community.

Summary information:
With the recent release of 5-Year American Community Survey (ACS) data for the 2005-2009 period, RTI has generated a new synthetic population database using the updated ACS demographic data. The synthetic population database contains a computerized representation of every household (112,383,675) and person (282,735,758) who lives in a household in the U.S. with many household-level (for example, household income, householder race, householder age, and household size) and person-level characteristics (for example, age, sex, race, ethnicity). In addition, each household has a latitude/longitude coordinate. The spatial distribution of the synthesized households closely match the general spatial distribution of the U.S. population.

The database was created by RTI International (http://www.rti.org) under a grant from the National Institutes of General Medical Sciences called the ‘Models of Infectious Disease Agent Study’ (or MIDAS, (http://www.nigms.nih.gov/Research/FeaturedPrograms/MIDAS/)).

Accuracy: The 2005-2009 Ver. 1 Synthetic Population dataset was built by selecting households from the 2005-2009 ACS Public Use Microdata Systems (PUMS) 5% data sample such that the selected households would match the aggregated demographic counts, by census block group, from the 2005-2009 ACS estimates. The selection method attempts to match the distributions of household size, household income, householder age, and householder race at the block group level. Detailed comparison tables that illustrate the success of the matching for each variable and block group are included in the dataset.

Documentation: A QuickStart Guide that provides information on the content and structure of the database is available here (https://www.epimodels.org/midasdocs/SynthPop/2005-2009_synth_pop_ver1_quickstart_2012_01_10.pdf)

Data Download: Data is provided in a zipped files containing a set of ascii comma separated files. To access the data, go to the download page (https://www.epimodels.org/midas/pubsyntdata1.do#downloads).

Future Plans: We will be enhancing the database very shortly by a) adding group quarters facilities and synthetic group quarters residents; b) assigning school aged children to real schools based on location and school capacity by grade. Our previous synthetic database also included assignments of working persons to work places and we hope to provide the same data in the future based on availability of new census commuting data.

Contact for further info: If you have questions about the data, how to access, it our how to use it, please contact Bill Wheaton ([email protected]) for more information.

cmbarton · January 20, 2012, 3:10pm

Thanks much for announcing this. It sounds like a very good resource for the community.

Michael Barton