# Five Fast and Replicable Spatial Tasks with PostGIS

There is a crisis of replicability in scientific results that involve spatial data because important calculations are often carried out by hand in proprietary software.  Without code to serve as a paper trail, important steps in the analysis become difficult to check for errors,  alter for testing , or even to really document adequately. Performing spatial tasks by hand is tedious and discourages researchers from performing standard robustness checks, adding additional measures, or trying new things. One solution to both problems is to encourage the use of spatially aware languages and databases solutions.  This brief post outlines how to perform five common spatial tasks using one of the best tools available, the  PostGIS spatial extension to the popular and free PostgresSQL database.

# Starting with a Hexagon Lattice Base Layer

For these examples, I’m using data collected for a project with Kristen Harkness, on an insurgency that took place 60 years ago in Colonial Kenya called the Mau Mau Rebellion. 1Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper. Our base layer is a hexagon lattice covering the 1950s borders of Kenya. It consists of 6,300 regular hexagons each 11km wide.  I created this lattice and added it as a table called “hex_grid” using a function described in my previous post.

## One: Counting Points within a Boundary

On the left is a map of 4,703 buildings listed by name, type, latitude, and longitude by a 1964 Gazeteer.2Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964. The text was OCR’d, cleaned and processed into a data frame and outputted as a shapefile in Python. I imported the shapefile into PostGIS using  the Shapefile Import/Export Manager. It was created as a table called “historical_gazetteer” where each row is a building and each building’s geometry is single point.

On the right is the count of buildings within each hexagon of the base layer. It takes only three lines of SQL to create which I’ve expanded greatly below to provide explanatory comments. It works by first joining the hexagon table on the historical_gazetteer table where rows in one spatially overlap with rows in the other.  It then goes hexagon by hexagon, summing the count of matching gazetteer rows that are listed as buildings.

```ALTER TABLE hex_grid ADD COLUMN gaz_building float; --Add a column to the hexagon table
UPDATE hex_grid SET gaz_building = 0.0; --Initialize that column to a count of zero
UPDATE hex_grid SET gaz_building = temporary_holder.building_count
FROM
(
SELECT
hex_grid.gid, --Keep the ID for each hexagon to help specify which rows to update
sum( --The aggregation function used in collapsing to just hexagons
historical_gazetteer.building --Just a vector of ones meaning yes if building
) AS building_count --Give the total a name for easy reference
FROM
(
hex_grid
INNER JOIN -- Make a new table where each pair of hexagon-gazetteer point is a row
historical_gazetteer
ON ST_Intersects(hex_grid.geom, historical_gazetteer.geom) -- Only for pairs that occupy the same space
)
GROUP BY hex_grid.gid -- Collapse hexagon-gazetteer point pairs to just hexagons
) AS temporary_holder --The placeholder name for the aggregate results
WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number
```

## Two: Measure Vectors in a Boundary

On the left is a map of roads in 1950s Kenya produced from a vectorization of modern road maps cross referenced against a contemporary road atlas in a process I describe here. It is imported to PostGIS from a shapefile, where each row is an individual road segment with a geometry that is a Linestring.

On the right is the length of road vectors passing through each hexagon. It works by first clipping road vectors to the boundaries of a given hexagon, calculating their length, and then totaling them.

```ALTER TABLE hex_grid ADD COLUMN road_length float; -Add a column to the hexagon table
UPDATE hex_grid SET road_length = 0.0; --Initialize that column to a count of zero
FROM
(
SELECT
hex.gid,
sum( --Total those bits for each hexagon
ST_Length( --Take the length of the bits of road within the hexagon
ST_Intersection(hex_grid.geom, roads.geom ) --Clip intersecting road vectors to just the the space of the current hexagon
)
) AS roadlength -- Give a name to the sum results
FROM
(
hex_grid
JOIN
ON ST_Intersects(hex_grid.geom, kenya_roads.geom)  -- Only for pairs that occupy the same space
)
GROUP BY hex.gid -- Collapse hexagon-gazetteer point pairs to just hexagons
) AS temporary_holder --The placeholder name for the aggregate results
WHERE hex_grid.gid=temporary_holder.gid --Update rows in the hexagon table with matching id number
```

## Three: Measuring Distance to Nearest Something

On the left is a map of the same road segments described above.

On the right is a map of each hexagon’s distance to the nearest road, with darker areas further from any transportation infrastructure. The code takes advantage of PostGIS’s ability to use a special index called a GiST or Generalized Search Tree for fast nearest neighbor searches. For each hexagon, it sorts all of the roads by their distance from that hexagon and then returns the top 1 nearest result. This can be easily generalize to be the k nearest neighbors.

```ALTER TABLE hex_grid ADD COLUMN road_nearest float;
UPDATE hex_grid SET road_nearest = 0.0;
UPDATE hex_grid SET road_nearest = rex.length
FROM
(
SELECT
hex_grid.gid,
ST_DISTANCE( --Calculate distance
hex_grid.the_geom, --From this hexagon
(SELECT --To its nearest road neghbor
ORDER BY hex_grid.the_geom <-> kenya_roads.geom --Ordered by each road's distance from this hexagon's centroid
LIMIT 1) --Take only the single nearest neighbor
) AS length --Assign a name to the distance
FROM hex_grid --Pull the hexagon ID and location above from the hex_grid table
) AS temporary_holder --The placeholder name for the aggregate results
WHERE hex_grid.gid=rex.gid --Update rows in the hexagon table with matching id number
```

## Four: Take the Property of a Intersecting Polygon

On the left is a map of 1950s Kenya district boundaries. Imported into postgis as a shapefile, each row is a district with a geometry type of polygon.

On the right is a map of each hexagon taking the value of the district that overlaps with the centroid of the hexagon. Hexagons may overlap more than one district, but they will be assigned to the one that they almost always overlap the most with.

```
ALTER TABLE hex_grid ADD COLUMN political_district varchar; --Add a column to the hexagon table
UPDATE hex_grid SET political_district = ''; --Initialize that column to an empty string
UPDATE hex_grid SET political_district = temporary_holder.district
FROM
(
SELECT
hex_grid.gid,
(SELECT
kenya_districts1962.district
FROM
kenya_districts1962
WHERE ST_Intersects( --True if the district and hexagon centroid overlap
ST_Centroid(hex_grid.the_geom), --Take the hexagon's centroid
kenya_districts1962.geom
)
LIMIT 1
) AS districtname --Give a name to the result
FROM hex_grid
) AS temporary_holder
WHERE hex_grid.gid=temporary_holder.gid

```

## Five: Calculate Raster Statistics within a Polygon

Finally in the last example, on the left is a raster image from an elevation DEM clipped to the boundary of Kenya. PostGIS comes with a helper app for importing the raster as a table in the database. Rasters represent regularly spaced measurements across space as square pixels.

On the right is is a map of elevation aggregated at the hexagon level. To reduce processing time and warning messages thrown, it first subsets to just those hexagons that overlap with at least part of the raster.  It then clips the raster to each hexagon’s boundaries, and calculates raster summary statistics. ST_SummaryStats will calculate  the count of pixels as well as the sum, mean, standard deviation, min, and max of pixel values. We use only the sum statistic, which we could further divide by the count of pixels to get an average elevation. I only calculate the total here since polygons are equally sized with approximately the same number of pixels in the denominator.

```ALTER TABLE hex_grid ADD COLUMN elevation_sum float;
UPDATE hex_grid SET elevation_sum = 0.0;
UPDATE hex_grid SET elevation_sum = temporary_holder.elevation
FROM
(
SELECT
hex_grid.gid,
(ST_SummaryStats( --Function that calculates several statistics for a raster
ST_Union(
ST_Clip(rast,the_geom) --Clip the raster to the current polygon
)
)).sum --Access only the summation statistic
as elevation --Rename the result
FROM
elevation, --From the
hex_grid
WHERE ST_Intersects(the_geom,rast) --Subset to only the hexagons covered by the raster to reduce errors/time
GROUP BY hex_grid.gid --Test

) AS temporary_holder
WHERE hex_grid.gid=temporary_holder.gid
```

Foot Notes   [ + ]

 1 ↑ Rex Douglass and Kristen Harkness, The Cost of Silence: Collective Punishment and Intelligence Extortion during the Kenya Emergency. Working Paper. 2 ↑ Kenya; official standard names approved by the United States Board on Geographic Names. United States. Washington [U.S. Govt. Print. Off.] 1964.