Five Fast and Replicable Spatial Tasks with PostGIS

FacebookTwitterGoogle+RedditLinkedIn

thumbnailThere is a crisis of replicability in scientific results that involve spatial data because important calculations are often carried out by hand in proprietary software.  Without code to serve as a paper trail, important steps in the analysis become difficult to check for errors,  alter for testing , or even to really document adequately. Performing spatial tasks by hand is tedious and discourages researchers from performing standard robustness checks, adding additional measures, or trying new things. One solution to both problems is to encourage the use of spatially aware languages and databases solutions.  This brief post outlines how to perform five common spatial tasks using one of the best tools available, the  PostGIS spatial extension to the popular and free PostgresSQL database.

Starting with a Hexagon Lattice Base Layer

For these examples, I’m using data collected for a project with Kristen Harkness, on an insurgency that took place 60 years ago in Colonial Kenya called the Mau Mau Rebellion. 1Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper. Our base layer is a hexagon lattice covering the 1950s borders of Kenya. It consists of 6,300 regular hexagons each 11km wide.  I created this lattice and added it as a table called “hex_grid” using a function described in my previous post.

hexagons

 

Five Common Spatial Tasks

One: Counting Points within a Boundary

On the left is a map of 4,703 buildings listed by name, type, latitude, and longitude by a 1964 Gazeteer.2Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964. The text was OCR’d, cleaned and processed into a data frame and outputted as a shapefile in Python. I imported the shapefile into PostGIS using  the Shapefile Import/Export Manager. It was created as a table called “historical_gazetteer” where each row is a building and each building’s geometry is single point.

On the right is the count of buildings within each hexagon of the base layer. It takes only three lines of SQL to create which I’ve expanded greatly below to provide explanatory comments. It works by first joining the hexagon table on the historical_gazetteer table where rows in one spatially overlap with rows in the other.  It then goes hexagon by hexagon, summing the count of matching gazetteer rows that are listed as buildings.

building_point becomes building_hex
ALTER TABLE hex_grid ADD COLUMN gaz_building float; --Add a column to the hexagon table
UPDATE hex_grid SET gaz_building = 0.0; --Initialize that column to a count of zero
UPDATE hex_grid SET gaz_building = temporary_holder.building_count
	FROM
	(
		SELECT
			hex_grid.gid, --Keep the ID for each hexagon to help specify which rows to update
			sum( --The aggregation function used in collapsing to just hexagons
			    historical_gazetteer.building --Just a vector of ones meaning yes if building
			    ) AS building_count --Give the total a name for easy reference
		FROM
		(
			hex_grid
			INNER JOIN -- Make a new table where each pair of hexagon-gazetteer point is a row
			historical_gazetteer
			ON ST_Intersects(hex_grid.geom, historical_gazetteer.geom) -- Only for pairs that occupy the same space
		)
		GROUP BY hex_grid.gid -- Collapse hexagon-gazetteer point pairs to just hexagons
	) AS temporary_holder --The placeholder name for the aggregate results
	WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number

Two: Measure Vectors in a Boundary

On the left is a map of roads in 1950s Kenya produced from a vectorization of modern road maps cross referenced against a contemporary road atlas in a process I describe here. It is imported to PostGIS from a shapefile, where each row is an individual road segment with a geometry that is a Linestring.

On the right is the length of road vectors passing through each hexagon. It works by first clipping road vectors to the boundaries of a given hexagon, calculating their length, and then totaling them.

roads_vector becomes roads_hex

 

ALTER TABLE hex_grid ADD COLUMN road_length float; -Add a column to the hexagon table
UPDATE hex_grid SET road_length = 0.0; --Initialize that column to a count of zero
UPDATE hex_grid SET road_length = temporary_holder.roadlength
	FROM
	(
		SELECT
			hex.gid,
			sum( --Total those bits for each hexagon
			    ST_Length( --Take the length of the bits of road within the hexagon
			              ST_Intersection(hex_grid.geom, roads.geom ) --Clip intersecting road vectors to just the the space of the current hexagon
			             )
			    ) AS roadlength -- Give a name to the sum results
		FROM
		(
			hex_grid
			JOIN
			kenya_roads
			ON ST_Intersects(hex_grid.geom, kenya_roads.geom)  -- Only for pairs that occupy the same space
		)
		GROUP BY hex.gid -- Collapse hexagon-gazetteer point pairs to just hexagons
	) AS temporary_holder --The placeholder name for the aggregate results
	WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number

Three: Measuring Distance to Nearest Something

On the left is a map of the same road segments described above.

On the right is a map of each hexagon’s distance to the nearest road, with darker areas further from any transportation infrastructure. The code takes advantage of PostGIS’s ability to use a special index called a GiST or Generalized Search Tree for fast nearest neighbor searches. For each hexagon, it sorts all of the roads by their distance from that hexagon and then returns the top 1 nearest result. This can be easily generalize to be the k nearest neighbors.

roads_vector becomes roads_nearest

ALTER TABLE hex_grid ADD COLUMN road_nearest float;
UPDATE hex_grid SET road_nearest = 0.0;
UPDATE hex_grid SET road_nearest = rex.length
	FROM
	(
		SELECT
			hex_grid.gid,
			ST_DISTANCE( --Calculate distance
				    hex_grid.the_geom, --From this hexagon
				    (SELECT --To its nearest road neghbor
					kenya_roads.geom --Each road's location
					FROM kenya_roads  --From the road table
					ORDER BY hex_grid.the_geom <-> kenya_roads.geom --Ordered by each road's distance from this hexagon's centroid
					LIMIT 1) --Take only the single nearest neighbor
				    ) AS length --Assign a name to the distance
		FROM hex_grid --Pull the hexagon ID and location above from the hex_grid table
	) AS temporary_holder --The placeholder name for the aggregate results
	WHERE hex_grid.gid=rex.gid --Update rows in the hexagon table with matching id number

Four: Take the Property of a Intersecting Polygon

On the left is a map of 1950s Kenya district boundaries. Imported into postgis as a shapefile, each row is a district with a geometry type of polygon.

On the right is a map of each hexagon taking the value of the district that overlaps with the centroid of the hexagon. Hexagons may overlap more than one district, but they will be assigned to the one that they almost always overlap the most with.

district_polygon becomes district_hex

 


ALTER TABLE hex_grid ADD COLUMN political_district varchar; --Add a column to the hexagon table
UPDATE hex_grid SET political_district = ''; --Initialize that column to an empty string
UPDATE hex_grid SET political_district = temporary_holder.district
	FROM
	(
		SELECT
			hex_grid.gid,
			(SELECT
				kenya_districts1962.district
				FROM
				kenya_districts1962
				WHERE ST_Intersects( --True if the district and hexagon centroid overlap
						    ST_Centroid(hex_grid.the_geom), --Take the hexagon's centroid
						    kenya_districts1962.geom
						   )
				LIMIT 1
			) AS districtname --Give a name to the result
		FROM hex_grid
	) AS temporary_holder
	WHERE hex_grid.gid=temporary_holder.gid

Five: Calculate Raster Statistics within a Polygon

Finally in the last example, on the left is a raster image from an elevation DEM clipped to the boundary of Kenya. PostGIS comes with a helper app for importing the raster as a table in the database. Rasters represent regularly spaced measurements across space as square pixels.

On the right is is a map of elevation aggregated at the hexagon level. To reduce processing time and warning messages thrown, it first subsets to just those hexagons that overlap with at least part of the raster.  It then clips the raster to each hexagon’s boundaries, and calculates raster summary statistics. ST_SummaryStats will calculate  the count of pixels as well as the sum, mean, standard deviation, min, and max of pixel values. We use only the sum statistic, which we could further divide by the count of pixels to get an average elevation. I only calculate the total here since polygons are equally sized with approximately the same number of pixels in the denominator.

elevation_raster becomes elevation_hex

 

ALTER TABLE hex_grid ADD COLUMN elevation_sum float;
UPDATE hex_grid SET elevation_sum = 0.0;
UPDATE hex_grid SET elevation_sum = temporary_holder.elevation
 FROM
 (
 SELECT
 hex_grid.gid,
 (ST_SummaryStats( --Function that calculates several statistics for a raster
 ST_Union(
 ST_Clip(rast,the_geom) --Clip the raster to the current polygon
 )
 )).sum --Access only the summation statistic
 as elevation --Rename the result
 FROM
 elevation, --From the
 hex_grid
 WHERE ST_Intersects(the_geom,rast) --Subset to only the hexagons covered by the raster to reduce errors/time
 GROUP BY hex_grid.gid --Test 

 ) AS temporary_holder
 WHERE hex_grid.gid=temporary_holder.gid
FacebookTwitterGoogle+RedditLinkedIn

Foot Notes   [ + ]

1. Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper.
2. Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>