Dasymetric Modelling of Population Distribution – Large Data Approach

An access to high resolution data on population distribution is needed for a wide range of applications related to urban and transport planning (Benn 1995, Murray et al. 1998, Pattnaik et al. 1998), resources management (Gleick 1996), disaster/relief mitigation (Bhaduri et al. 2002), assessment of human pressure on the environment (Weber, Christophersen 2002) and quantifying environmental impact on population (Vinkx, Visee 2008). Reliable information on population distribution is also essential for characterizing population at risk from natural hazards (Dobson et al. 2000, Chen et al. 2004, Tralli et al. 2005, Thieken et al. 2006, McGranahan et al. 2007, Maantay, Maroko 2009, Mondal, Tatem 2012, Tatem et al. 2012, Berke et al. 2015, Tenerelli et al. 2015, Calka et al. 2017) and for public health applications such as disease burden estimation and epidemic modelling (Hay et al. 2005, Tatem et al. 2008, 2011, 2012).

The quality of population data varies from one country to another, especially between low-income and high-income countries. Low-income countries are often lacking population data or data has a poor quality (Tatem et al. 2007). High-income countries usually have resources to collect data for each household, but such data is only released in the form of areal aggregates to protect privacy (Langford 2013, Bakillah et al. 2014). The size of aggregation units differ between the countries. However, even the smallest aggregated units have often insufficient spatial resolution for many practical applications. Aggregated data also has many limitations (Schroeder 2007, Dmowska et al. 2017), including:

spatial resolution depends on the choice of census units, and it is spatially varying (low in rural areas, higher in urban areas),

mapped population is distributed uniformly within each census unit, even if majority of the area is uninhabited (covered by parks, forest, water, etc.), and

spatial extents of census units change with time, which makes difficult to conduct year-to-year comparisons.

Furthermore, such data is usually available in the form of attribute tables, with the option to join them to vector files containing boundaries of aggregation units in order to perform GIS-based analysis. This makes the data difficult to work with, especially for the large areas.

Most of aforementioned limitations of aggregated data can be overcome by gridded (raster) data. The advantage of using gridded data include:

spatial resolution, defined by the size of the cell, is high and spatially constant over the whole area;

the extend of cells does not change between years making year-to-year comparison easy to perform, and

the uninhabited areas can be properly identifies and properly mapped via dasymetric modelling thus making population maps more accurate.

Several methods for disaggregating census data into grid cells (or smaller areal units) have been introduced over the years. Such methods can be divided into two groups: areal weighting (Goodchild, Lam 1980, Flowerdew, Green 1992, Goodchild et al. 1993) and dasymetric modelling (Wright 1936, Langford, Unwin 1994, Eicher, Brewer 2001, Mennis 2003). Areal weighting is a type of an areal interpolation used to transform geographic data from one set of boundaries to another. Areal weighting assigns to each grid cell population value based on its percentage area of the host areal units (Mennis 2003). Dasymetric modelling uses ancillary information of higher spatial resolution to help refine location of population during the process of disaggregating spatial data to finer units (Mennis 2003).

Dasymetric modelling takes advantage of a correlation (a model) between population density and values of ancillary variable; the stronger the correlation (the better the model) the more accurate is the resulting population grid. Dasymetric modelling is well established in the literature (for a review see Petrov 2012). It has been defined and developed in 1911 by Benjamin (Veniamin) Petrovich Semenov-Tyan-Shansky (Bielecka 2005, Petrov 2012) and popularized by Wright (1936). After 2000, an interest in dasymetric mapping had significantly increased due to the progress in the GIS and remote sensing technologies (Mennis 2009, Petrov 2012). Published papers described development of new approaches to dasymetric modelling based on different ancillary data and a variety of techniques to establish relation between population density and values of ancillary variable. These papers focus on the theory and do not provide actual datasets which are the results of proposed techniques.

Among the ancillary data used to disaggregate population, the most popular is the land cover data (Wright 1936, Mennis 2003, Bielecka 2005, Gallego et al. 2011, Linard et al. 2011, Dmowska, Stepinski 2014, 2017a, b, Dmowska et al. 2017). Land cover data is provided in the form of a categorical grid, with different categories indicating types of land cover. Broad-scale land cover datasets are obtained by classifying large mosaics of remotely sensed multispectral images. They have spatial resolution higher than the resolution of census aggregated units. One problem with land cover datasets is that they are based on surface spectral properties leading to possible confusion between populated and unpopulated objects (for example buildings) having same spectral signatures. This problem can be minimized by adding land use data as an additional ancillary variable (Dmowska, Stepinski 2017a).

The other source of ancillary data is high resolution satellite images (Lu et al. 2010, Ural et al. 2011, Lung et al. 2013), LIDAR data (Lu et al. 2010), tax parcel data (Maantay et al. 2007, Tapp 2010, Jia et al. 2014, Jia, Gaughan 2016), street density (Reibel, Bufalino 2005), density of point of interests (Bakillah et al. 2014), light emission data (Briggs et al. 2007, Sridharan, Qiu 2013) and address datasets (Zandbergen 2011). Recently, social media data also are used (Patel et al. 2017). Such datasets can be used individually or in combination to construct a dasymetric model.

Many papers concentrate on establishing relation between population and ancillary data in dasymetric modelling. These approaches had changed over the years from using predetermined weights (binary approach or limiting variable estimation (Eicher, Brewer 2001), through using empirical sampling (Mennis 2003, Mennis, Hultgren 2006), to employ statistical techniques such as regression analysis (Flowerdew, Green 1992, Briggs et al. 2007) or random forest (Stevens et al. 2015). An overview of developed methods can be found among others in Wu et al. (2005) and Maantay et al. (2007).

Despite the increasing body of the literature describing various techniques for dasymetric modelling, there is still lack of high resolution population grids. There are only few products which provide high resolution population grids on global or continental scale (Table 1).

Table 1

Characteristics of broad-scale population grids.

Project	Region	Resolution	Timestamp	Availability
WorldPop	South America, Central America, Africa, Asia	100 m (country), 1 km (continent)	2010–2020 with 5 year interval	http://www.worldpop.org.uk
LandScan	world-wide	1 km	2000–2017 with 1 year interval	https://landscan.ornl.gov
GPWv4	world-wide	1 km	2000, 2010, 2015, 2020	http://sedac.ciesin.columbia.edu
E.U. pop grid	European Union countries	100 m	2000	http://www.eea.europa.eu
Australian pop.grid	Australia	1 km	2011	http://www.abs.gov.au
SEDAC-USA	United States	1 km (USA), 250 m (MSA)	1990, 2000, 2010	http://sedac.ciesin.columbia.edu
SocScape	United States	30 m	1990, 2000, 2010	http://sil.uc.edu

LandScan and Gridded Population of the World, Version 4 (GPWv4) provide population grids at global scale at a resolution of 30 arc-seconds (approximately 1 km at the equator). LandScan is developed by the Oak Ridge National Laboratory using the best available census data for particular regions. It is a product of dasymetric modelling based on land cover, roads, slope, urban areas, village locations, and high resolution imagery analysis as ancillary data and sub-national level census counts for each country as population data. LandScan population grid is a combination of locally adoptive models that are tailored to account the differences in spatial data availability, quality, scale, and accuracy of data for each individual country and region (ORNL 2019).

Gridded Population of the World, Version 4 (SEDAC 2019) provides gridded population estimates with a resolution of 30 arc-seconds (approximately 1 km at the equator) for the years 2000, 2005, 2010, 2015, and 2020. The census data, collected around 2010 (between 2005 and 2014) are extrapolated to a series of output years. GPWv4 is the result of uniform areal weighting approach (Doxsey-Whitfield et al. 2015).

The WorldPop project (2019) provides population grids at a resolution of 1km at the continental scale and 100 m/cell for most individual countries in Africa, Asia, as well as in South and Central Americas. It was initiated in 2013 by combining the AfriPop, AsiaPop and AmeriPop population mapping projects (Gaughan et al. 2013, Tatem et al. 2013). Population grids are the result of dasymetric modelling, performed for each country separately based on census data (or official population estimates) at the finest level of aggregation available for each country and using remotely-sensed and geospatial datasets (e.g. settlement locations, settlement extents, land cover, roads, building maps, health facility locations, satellite night lights, vegetation, topography, refugee camps) as ancillary data. Dasymetric modelling follows a procedure described by Stevens et al. (2015).

The WorldPop project (2019) also provides data for mapping births and pregnancies (Tatem et al. 2014), age and sex structure (Alegana et al. 2015) and population dynamics based on cell phone data (Deville et al. 2014). The recent initiative (WorldPop Archives) aims at providing uniform, resampled and co-registered spatial data layers at two different resolutions (3 and 30 arc-second) ready-to-use for modelling and mapping population distribution (Lloyd et al. 2018).

A 100 m/cell population grid (Gallego 2010, Gallego et al. 2011) has been developed for the European Union (EU) countries; it is available from the European Environment Agency data warehouse (EEA 2019). This dataset is a result of dasymetric modelling calculated using population data from the 2000/2001 round of censuses aggregated to nearly 115,000 areal units and 100 m/cell raster version of CORINE Land Cover 2000 as an ancillary data.

Batista e Silva et al. (2013) reported on producing 2006 population estimates at 100 × 100 meter cells, for the territory of the EU27 (except Greece) and Andorra, Norway, Iceland, San Marino, Monaco, Lichtenstein, the Vatican City. Authors tested several different approaches to establish relation between population data and ancillary variables as well several different ancillary datasets to check whether using more detailed ancillary data in the dasymetric mapping leads to improved accuracy. The final map uses population data aggregated to 100,925 local administrative units (LAU2) downloaded from EUROSTAT with a refined version of CORINE Land Cover 2006 and information on the soil sealing degree. The final map is only made available in the PDF format as a supplementary material to paper Batista e Silva et al. (2013).

In the North America WorldPop project (WorldPop 2019) provides data for Mexico and a few separate projects provide data for the United States. Until recently, the only available data for the entire U.S. were the population grids developed by SEDAC as a result of aerial interpolation of U.S. census data. These grids are available of 30 arc-seconds (approximately 1 km at the equator) resolution for the U.S. for 1990, 2000 and 2010 year and at a 7.5 arc-second (approximately 250 m) resolution for major metropolitan areas (MSA) for 1990 and 2000 year. Although there are prepared for 1990, 2000, 2010, they cannot be used for direct comparison studies due to different format (2000 year dataset use integer counts, whereas 1990 and 2010 real number counts which makes those datasets non-comparable). Also ~1km resolution is not sufficient for many practical applications.

Since 2014 the other resource of population grids are provided by SocScape project (Dmowska, Stepinski 2017a, Dmowska et al. 2017). SocScape project makes available two types of products for the conterminous U.S.:

30m high resolution grids of the entire population and for race/ethnicity subpopulations in 1990, 2000, 2010,

racial diversity grids.

High resolution grids are the product of dasymetric modelling performed on block-level census data (the smallest level of aggregation in the U.S.) and 30 m National Land Cover Datasets (NLCD). Racial diversity grids show spatial character of racial diversity across the U.S. in the form of three-dimensional classification of grid cells based on population density, dominant race and diversity level expressed by standardized entropy (Dmowska et al. 2017). SocScape data are the only available on the public domain broad-scale population grids comparable between years that can be used for quantitatively assessment of population changes.

Although dasymetric modelling is a wellknown technique and straightforward to apply, its application for producing broad-scale, high resolution grids presents several challenges. The main challenge is the availability of the high resolution ancillary data, which must be available for the entire area of interests in uniform fashion and quality and must be comparable between different years if population grids are intended to be use for change analysis. Producing broad-scale high resolution maps require development of an efficient, fully automated algorithm to work with large datasets, so calculations can be performed within a reasonable time.

This paper is not focused on the development of new techniques and testing different types of ancillary data for the dasymetric modelling, but on developing fully automated computational framework and applying it to provide actual, broad-scale, multi-year comparable population grids that can be an input to the wide range of applications. This paper consists of three parts:

an extensive review of the literature on constructing broad-scale population grids,

description of the development of R-based implementation of computational framework,

showing examples of resultant population grids.

In addition, Section 2 briefly describes data and presents short overview of methodology used to produce high-resolution, multi-year compatible population grids for the entire U.S, which are the part of the SocScape project. Final conclusions are drawn in Section 4.

Data and methods

To produce population grids, which are temporarily comparable, requires the following condition on the data:

usage of contemporarily collected population and ancillary data to construct a grid for a given epoch (for example, 2000 census data should be coupled with circa 2000 land cover data), and

ancillary data should have the same meaning over all epochs (for example, land cover data at different years should have the same categories).

Data in the SocScape project fulfils those conditions, thus making the resultant population grids comparable between different years.

Population data

The source of the population data in the SocScape project is the 1990, 2000, and 2010 decennial U.S. Censuses data aggregated at the block level. The block level is the smallest aggregated units of the U.S. Census. This data consists of two components: shapefiles (TIGER/Line Files), indicating blocks geographical boundaries, and summary text files which lists population data for each block. Data has been downloaded from National Historical Geographic Information System (NHGIS) (MPC 2019). NHGIS project distributes population and shapefiles with additional key identifier making easier joining boundaries with an attribute tabular data. Tabular data are available as a one file for the entire U.S., whereas shapefiles are provided at the state level. Size of shapefiles containing block boundaries and their population counts vary from 34 MB for District of Columbia to 4037 MB for the state of California. The overall size is 39 GB. Number of blocks in 1990, 2000 and 2010 is ~7.15 million (1990), ~8.2 million (2000), and ~11.15 million (2010).

Ancillary data

SocScape project uses land cover datasets as ancillary data. This choice is dictated by the fact that land cover is the only ancillary data for which a single dataset, the National Land Cover Dataset or NLCD, covers the entire conterminous U.S. (or CONUS) at the same spatial resolution (30 m per cell) and the same quality.

NLCD datasets are available for 1992, 2001, 2006 and 2011. However, NLCD 1992 has a legend which is incompatible with later editions. For comparison between 1992 and 2001 NLCD 1992/2001 Retrofit Land Cover Change Product should be used. It is a product based on Anderson Level 1 classification and consists of 8 unchanged categories (open water, urban, barren, forest, grass/shrub, wetlands, ice snow) and 55 categories (the combination of those 8) indicating changes between 1992 and 2001. Based on those categories 1992/2001 Retrofit Land Cover Change Product can be divided into two separate maps (for 1992 and for 2001) representing 8 categories of land cover types. NLCD 2001 and NLCD 2011 consist of 16 classes of land cover categories (including 4 categories of developed areas; see Fig. 3 for legend). Ancillary data are based on the 1992 land cover maps derived from 1992/2001 Retrofit Land Cover Change Product and 2001 and 2011 edition of NLCD (for example see panels A–C in Fig. 3). Land cover data from 1992, 2001 and 2011 match closely population data from 1990, 2000 and 2010 U.S. Decennial Censuses. To transform all three NLCD maps to a common legend land cover maps are reclassified to just three categories: urban (represents 4 developed categories in NLCD and urban category in 1992 land cover map), vegetation (represents forest and agriculture categories), and uninhabited (represents water, ice/snow, barren land categories). Example of ancillary data is shown in Fig. 3. These reclassified maps are used as ancillary data for a dasymetric model.

Methods

The overall dasymetric model follows the methodology introduce by Dmowska and Stepinski (2017a) to obtain 2010 population grid. The only difference is in using 3-class land cover data instead of the combination of land cover/land use classes as ancillary data. According to this methodology the population in each block is redistributed to its cells using block-specific weights assigned to the cells having different ancillary classes. The weights are assigned based on the relative density of population for each ancillary class and the area of each block occupied by each class (Mennis 2003). The population in each cell is calculated by multiplying the number of people in the block by a weight assigned to the cell based on its ancillary class.

The important step in dasymetric modelling is the establishment of the relationship between ancillary and population data. A presented model uses the set of characteristic (or representative) values of population densities in each ancillary class (Mennis, Hultgren 2006). Representative population density for each class is established using a set of blocks (selected from the entire conterminous U.S.) having relatively homogeneous land cover (90% for urban class and 95% for vegetation class). The representative density for particular ancillary class is calculated by dividing the sum of population living in the selected blocks by the overall area of these blocks. Representative densities are required to establish the relative density of population used to calculate block-specific weights. Relative density of population for each ancillary class is calculated by dividing the representative density for this class and the sum of representative densities for all ancillary classes.

Computation and results

R implementation of dasymetric modelling

The major challenge to calculating 30m dasymetric model of population density for the entire conterminous U.S. is the size of input and output data. Population grids provided by SocScape project are the result of disaggregating ~11 millions of census blocks into over 8 billion (8,651,157,015) of grid cells. The choice of output resolution (30m) is dictated by the resolution of ancillary data as it is most convenient to disaggregate census data to the resolution of the ancillary data.

Traditionally, dasymetric modelling was computed in a GIS environment, such as ESRI ArcGIS (Sleeter, Gould 2007), QGIS (Mileu, Margarida 2018), GRASS GIS (Dmowska, Stepinski 2014). However, for a broad-scale model such approach is computationally inefficient. Processing such amount of data requires fully automated and flexible computational environment. Dasymeteric model used in SocScape is implemented in R (R Core Team 2018). R is a comprehensive computational environment that includes libraries to work with different types of data: tabular data, geospatial data (libraries sp, sf, raster). It also provides tools for binding to external data sources such as GRASS GIS (library rgrass7), GDAL (library rgdal) or standard relational databases (libraries DBI, RSQLite). R provides some advantage over GIS software and other programming languages. The main advantage is that it allows building efficient, flexible and fully automated computational environment to work with large dataset without advanced programming skills.

The key factor of calculation for the broadscale model is to manage data storage requirements and to control the time of computation (see Fig. 1).

In order to handle a large dataset in R, geospatial data is first divided into separate counties using region concept in GRASS GIS. In GRASS GIS, region settings determine the spatial extent and resolution of the grid. Geospatial data is next read into R and stored in SpatialGridDataFrame object. SpatialGridDataFrame is one of the spatial objects provided by sp library to work with spatial data in R (Pebesma, Bivand 2005). It integrates bounding box, the Coordinate Reference System and grid topology with attribute data stored in data.frame (tabular format in R). Working with SpatialGridDataFrame object allows integration of spatial content (census boundary, ancillary data) with Census population data into a single relational model, performing calculation at tabular level (data.frame), and, in the last step, propagating data into grids cells. The dasymetric modelling is performed in R for each county separately. In the last step dasymetric maps for individual counties are joined into a map for the entire conterminous United States.

Figure 2 shows in details the R-based implementation of dasymetric modelling in SocScape project. Calculation process has been divided into 5 steps:

pre-processing of Census and geospatial data,

the establishment of the relationship between ancillary and population data,

performing dasymetric modelling,

propagate dasymetric model for geospatial grids,

post processing: prepare hi-res population maps for the entire U.S.

The whole procedure is implemented in R. In addition to R, GRASS GIS 7.0, SQLite and GDAL library has been used in the pre-processing and post processing steps. The computational framework consists of several scripts used for reading U.S. Census text file into SQLite database, reading geospatial data from GRASS GIS to SpatialGridDataFrame object in R, performing dasymetric modelling, exporting population grids into GeoTiff and joining GeoTiffs into U.S.-wide map.

The first step of calculation procedure is the pre-processing of population and ancillary data. In this step U.S.-wide census block level data are imported from text file to SQLite database using R tools designed to work directly with a database (library DBI, RSQLite). SQLite is a public-domain, single-user relational database management system which stores the entire database as a single cross-platform file and implements a subset of the SQL 92 standard, including the core table creation, updating, insertion, and selection operations. RSQLite package embeds the SQLite database engine in R, providing a DBI-compliant interface (Mller et al. 2018).

RSQLite provides functions to read data from R to SQLite, write data directly from database table into data.frame in R, performing SQL queries. This functionality will be used to extract population data for the particular county and read them from a database directly into R data.frame object.

Geospatial data is pre-processed using GRASS GIS software (step not shown in Fig. 2.), before it is imported to SpatialGridDataFrame object in R. Block level census boundaries are available as state level shapefiles. Those shapefiles are imported to GRASS GIS, rasterized to match NLCD grid topology and divided into separate counties. Land cover data, used here as ancillary datasets, are stored as U.S.-wide files. Pre-processing of ancillary data includes extracting land cover data for 1992 from NLCD 1992/2001 Retrofit Land Cover Change Product, reclassifying NLCD maps into 3 classes (uninhabited, urban, vegetation) and dividing data into separate counties using region concept in GRASS GIS. Rasterized census block’s boundaries and 3-class ancillary data are imported to SpatialGridDataFrame object in R using rgrass7 package (Bivand 2017). Package rgrass7 provides interpreted interface between GRASS geographical information system, version 7 and R (Bivand 2017). The interface uses classes defined in the sp package to hold spatial data (Pebesma, Bivand 2005). This package allows reading raster data directly from GRASS GIS to SpatialGridDataFrame (or SGDF) object in R. The SPGD object for each county with two layers (census boundaries and ancillary data) is stored as rds file.

In the next step, data for a particular county is read to R to perform dasymetric modelling. Population data is extracted from SQLite database and read directly to data.frame object in R. Geospatial data is restored from rds files containing SGDF object with census boundaries and ancillary data. Before performing dasymetric modelling ancillary data are upgraded based on population data by assigning uninhabited class to blocks with population equal to 0.

Next, area of each ancillary class in each block is calculated and stored in data.frame. Notice, that at this point two data.frames are available – one of them containing block id and population data and the second containing block id and the area of each ancillary class in this block. Using these two types of information stored in data. frames the weights to redistribute population among different ancillary classes are calculated for each block. Then population in each block is multiplied by weights. It results in creating the data.frame with the number of people assign to each type of ancillary class for each block. Areas, weights and the result of dasymetric procedure are stored in data.frames and are written to SQLite database to be used for further analysis.

The result of dasymetric modelling is also propagated into grid and stored as additional layer (together with ancillary information and census block boundaries) in SpatialGridDataFrame object. In the last step dasymetric population grid for each county is exported from R to Geotiff using rgdal library. Package rgdal provides bindings to the Geospatial Data Abstraction Library (GDAL) (>= 1.11.4) and access to projection/transformation operations from the PROJ.4 library (Bivand et al. 2018). Finally, GDAL library is used to create U.S. conterminous population grid based on counties Geotiff. First, Virtual Dataset (VRT) that is a mosaic of the counties Geotiffs is built using gdalbuildvrt program provides by GDAL. Next VRT object is converted to U.S.-wide Geotiff using gdal_translate program provided by GDAL. The result is U.S.-wide population grid at 30m resolution.

The calculation of a dasymetric model for a single county (containing 10, 000 blocks) takes 14 seconds. In comparison, using dasymetric modelling toolbox (Sleeter, Gould 2007) in ArcGIS software calculations takes 600 seconds. The whole procedure from the pre-processing steps to obtaining the final dasymetric map for the entire conterminous U.S. takes 55 h using a PC computer with Intel 3.4 GHz, 4-cores processor and 16 GB of memory running the Linux system. The most time consuming step is data pre-processing (37 h). Determining a relation between population and ancillary data and performing dasymetric model takes 6 h, and creating one map from counties’ dasymetric models takes 12 h.

Examples of U.S.-wide population grids

Described implementation of dasymetric model has been used to produce high resolution, multi-year comparable, U.S.-wide population grids which are the part of the SocScape (Social Landscape) project. This project provides an open access to high resolution (30 m) population, subpopulation (separate race/ethnicity group) and racial diversity grids for the entire conterminous United States for 1990, 2000, 2010 (Dmowska, Stepinski 2017b, Dmowska et al. 2017). SocScape project consists of two parts – GeoWeb application designed to explore U.S-wide population and racial diversity grids and SocScape data website, which provides data for each county and for 363 MSA as a zip archive.

Figure 3 shows an example of population grids for the area centred on the city of Frisco, TX located in the Collin and Denton county. Frisco, TX is a part of the Dallas-Fort Worth metropolitan area and it is considered as the fastest-growing city in the United States from 2000–2009 with the population of 116,989 people at the 2010 census. Figure 3 is divided into 12 panels arranged into 3 columns (corresponding to 1990, 2000, 2010 year respectively) and 4 rows (corresponding to different types of data). The panels A–I show population and ancillary data used as an input to dasymetric modelling. Panels A–C present land cover data with the original legends whereas panels D–F present ancillary data reclassified into three categories, which are fully comparable between years. Panels G–I show census block level population data. The population grids are presented in the Panel J–L.

In this example, census block and population grids show the main features of population distribution in a similar way. The main limitation of block level data is that they cannot be used to quantitatively assess changes in population distribution. The boundary of aggregated units changed between 1990 and 2010 year. The urbanization process, which is seen in the land cover maps, caused an increase in the number of blocks in the presented area from 2100 in 1990 year to 10360 in 2010 year. On the other hand, population grids can be directly used to assess changes in population distribution, as they are produced based on multi-year comparable ancillary data.

Conclusion

This paper reviewed literature on production of broad-scale population grids and reported on the R implementation of an automated framework to perform dasymetric modelling to produce such grids. Described implementation of dasymetric model has been used to produce high resolution, multi-year comparable, U.S.-wide population grids for 1990, 2000, 2010 year.

Main advantages of using R to perform dasymetric calculation are:

no advanced programming skills are required,

less processing steps are required than using GIS software,

no intermediate layers are produced,

increased flexibility and automation, and

easily expandable to variables other than total population.

The framework has been implemented to work with U.S. Decennial Census population data available for 1990, 2000, 2010 years. However, it can be easily modified to work with other source of data and for other levels of aggregation (i.e. census tracts, block groups). The practical advantage of presented framework has been already illustrated by computing high resolution demographic grids for race/ethnicity subpopulation using weights established by population model. Preparing U.S.-wide demographic grids, using already established weights, take 13 h and it does not required any pre-processing steps. The weights established by population model are stored in SQLite database. The other demographic grids can be calculated by importing U.S.-wide block level data to SQLite database and by multiplying its counts by weights. Also the presented framework can be used to preparing high resolution population grids for 2020, when U.S Decennial Census data become available.

Presented framework can be also easily expandable to calculate other types of maps which use as an input the results of dasymetric modelling. Examples of such maps are racial diversity maps (Dmowska, Stepinski 2017b, Dmowska et al. 2017) and racial dots maps (Dmowska, Stepinski 2019).

Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Geoscienze, Geografia

Feed RSS della rivista

Dasymetric Modelling of Population Distribution – Large Data Approach

Anna Dmowska

Pubblicato online: 29 mar 2019

Pagine: 15 - 27

Ricevuto: 04 gen 2019

DOI: https://doi.org/10.2478/quageo-2019-0008

Parole chiavepopulation grids, dasymetric modelling, R

© 2019 Anna Dmowska, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Parole chiave
population grids, dasymetric modelling, R