There is a crisis of replicability in scientific results that involve spatial data because important calculations are often carried out by hand in proprietary software. Without code to serve as a paper trail, important steps in the analysis become difficult to check for errors, alter for testing , or even to really document adequately. Performing spatial tasks by hand is tedious and discourages researchers from performing standard robustness checks, adding additional measures, or trying new things. One solution to both problems is to encourage the use of spatially aware languages and databases solutions. This brief post outlines how to perform five common spatial tasks using one of the best tools available, the PostGIS spatial extension to the popular and free PostgresSQL database.

# Starting with a Hexagon Lattice Base Layer

For these examples, I’m using data collected for a project with Kristen Harkness, on an insurgency that took place 60 years ago in Colonial Kenya called the Mau Mau Rebellion.^{1}Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper. Our base layer is a hexagon lattice covering the 1950s borders of Kenya. It consists of 6,300 regular hexagons each 11km wide. I created this lattice and added it as a table called “hex_grid” using a function described in my previous post.

# Five Common Spatial Tasks

## One: Counting Points within a Boundary

On the left is a map of 4,703 buildings listed by name, type, latitude, and longitude by a 1964 Gazeteer.^{2}Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964. The text was OCR’d, cleaned and processed into a data frame and outputted as a shapefile in Python. I imported the shapefile into PostGIS using the Shapefile Import/Export Manager. It was created as a table called “historical_gazetteer” where each row is a building and each building’s geometry is single point.

On the right is the count of buildings within each hexagon of the base layer. It takes only three lines of SQL to create which I’ve expanded greatly below to provide explanatory comments. It works by first joining the hexagon table on the historical_gazetteer table where rows in one spatially overlap with rows in the other. It then goes hexagon by hexagon, summing the count of matching gazetteer rows that are listed as buildings.

ALTER TABLE hex_grid ADD COLUMN gaz_building float; --Add a column to the hexagon table UPDATE hex_grid SET gaz_building = 0.0; --Initialize that column to a count of zero UPDATE hex_grid SET gaz_building = temporary_holder.building_count FROM ( SELECT hex_grid.gid, --Keep the ID for each hexagon to help specify which rows to update sum( --The aggregation function used in collapsing to just hexagons historical_gazetteer.building --Just a vector of ones meaning yes if building ) AS building_count --Give the total a name for easy reference FROM ( hex_grid INNER JOIN -- Make a new table where each pair of hexagon-gazetteer point is a row historical_gazetteer ON ST_Intersects(hex_grid.geom, historical_gazetteer.geom) -- Only for pairs that occupy the same space ) GROUP BY hex_grid.gid -- Collapse hexagon-gazetteer point pairs to just hexagons ) AS temporary_holder --The placeholder name for the aggregate results WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number

## Two: Measure Vectors in a Boundary

On the left is a map of roads in 1950s Kenya produced from a vectorization of modern road maps cross referenced against a contemporary road atlas in a process I describe here. It is imported to PostGIS from a shapefile, where each row is an individual road segment with a geometry that is a Linestring.

On the right is the length of road vectors passing through each hexagon. It works by first clipping road vectors to the boundaries of a given hexagon, calculating their length, and then totaling them.

ALTER TABLE hex_grid ADD COLUMN road_length float; -Add a column to the hexagon table UPDATE hex_grid SET road_length = 0.0; --Initialize that column to a count of zero UPDATE hex_grid SET road_length = temporary_holder.roadlength FROM ( SELECT hex.gid, sum( --Total those bits for each hexagon ST_Length( --Take the length of the bits of road within the hexagon ST_Intersection(hex_grid.geom, roads.geom ) --Clip intersecting road vectors to just the the space of the current hexagon ) ) AS roadlength -- Give a name to the sum results FROM ( hex_grid JOIN kenya_roads ON ST_Intersects(hex_grid.geom, kenya_roads.geom) -- Only for pairs that occupy the same space ) GROUP BY hex.gid -- Collapse hexagon-gazetteer point pairs to just hexagons ) AS temporary_holder --The placeholder name for the aggregate results WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number

## Three: Measuring Distance to Nearest Something

On the left is a map of the same road segments described above.

On the right is a map of each hexagon’s distance to the nearest road, with darker areas further from any transportation infrastructure. The code takes advantage of PostGIS’s ability to use a special index called a GiST or Generalized Search Tree for fast nearest neighbor searches. For each hexagon, it sorts all of the roads by their distance from that hexagon and then returns the top 1 nearest result. This can be easily generalize to be the k nearest neighbors.

ALTER TABLE hex_grid ADD COLUMN road_nearest float; UPDATE hex_grid SET road_nearest = 0.0; UPDATE hex_grid SET road_nearest = rex.length FROM ( SELECT hex_grid.gid, ST_DISTANCE( --Calculate distance hex_grid.the_geom, --From this hexagon (SELECT --To its nearest road neghbor kenya_roads.geom --Each road's location FROM kenya_roads --From the road table ORDER BY hex_grid.the_geom <-> kenya_roads.geom --Ordered by each road's distance from this hexagon's centroid LIMIT 1) --Take only the single nearest neighbor ) AS length --Assign a name to the distance FROM hex_grid --Pull the hexagon ID and location above from the hex_grid table ) AS temporary_holder --The placeholder name for the aggregate results WHERE hex_grid.gid=rex.gid --Update rows in the hexagon table with matching id number

## Four: Take the Property of a Intersecting Polygon

On the left is a map of 1950s Kenya district boundaries. Imported into postgis as a shapefile, each row is a district with a geometry type of polygon.

On the right is a map of each hexagon taking the value of the district that overlaps with the centroid of the hexagon. Hexagons may overlap more than one district, but they will be assigned to the one that they almost always overlap the most with.

ALTER TABLE hex_grid ADD COLUMN political_district varchar; --Add a column to the hexagon table UPDATE hex_grid SET political_district = ''; --Initialize that column to an empty string UPDATE hex_grid SET political_district = temporary_holder.district FROM ( SELECT hex_grid.gid, (SELECT kenya_districts1962.district FROM kenya_districts1962 WHERE ST_Intersects( --True if the district and hexagon centroid overlap ST_Centroid(hex_grid.the_geom), --Take the hexagon's centroid kenya_districts1962.geom ) LIMIT 1 ) AS districtname --Give a name to the result FROM hex_grid ) AS temporary_holder WHERE hex_grid.gid=temporary_holder.gid

## Five: Calculate Raster Statistics within a Polygon

Finally in the last example, on the left is a raster image from an elevation DEM clipped to the boundary of Kenya. PostGIS comes with a helper app for importing the raster as a table in the database. Rasters represent regularly spaced measurements across space as square pixels.

On the right is is a map of elevation aggregated at the hexagon level. To reduce processing time and warning messages thrown, it first subsets to just those hexagons that overlap with at least part of the raster. It then clips the raster to each hexagon’s boundaries, and calculates raster summary statistics. ST_SummaryStats will calculate the count of pixels as well as the sum, mean, standard deviation, min, and max of pixel values. We use only the sum statistic, which we could further divide by the count of pixels to get an average elevation. I only calculate the total here since polygons are equally sized with approximately the same number of pixels in the denominator.

ALTER TABLE hex_grid ADD COLUMN elevation_sum float; UPDATE hex_grid SET elevation_sum = 0.0; UPDATE hex_grid SET elevation_sum = temporary_holder.elevation FROM ( SELECT hex_grid.gid, (ST_SummaryStats( --Function that calculates several statistics for a raster ST_Union( ST_Clip(rast,the_geom) --Clip the raster to the current polygon ) )).sum --Access only the summation statistic as elevation --Rename the result FROM elevation, --From the hex_grid WHERE ST_Intersects(the_geom,rast) --Subset to only the hexagons covered by the raster to reduce errors/time GROUP BY hex_grid.gid --Test ) AS temporary_holder WHERE hex_grid.gid=temporary_holder.gid

Foot Notes

1. | ↑ | Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper. |

2. | ↑ | Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964. |

## Leave a Reply