I often have to execute spatial joins between points and polygons, of say bombing events and the boundaries of the district they took place in. Quantum GIS uses ftools to execute these kinds of spatial joins, but failed on on a relatively modest join of 40k points and 9k boundaries. I could do the join in POSTGIS but I don’t want the overhead of a full spatial database for some quick analysis. The whole operation took only a few minutes to write up and few seconds to run in python, using the package pyshp to load shapefiles, rtree to build the spatial index, and shapely to do the final boundary check.

import shapefile import shapely #Load the shapefile of polygons and convert it to shapely polygon objects polygons_sf = shapefile.Reader("C:/PolygonShapeFile.shp") polygon_shapes = polygons_sf.shapes() polygon_points = [q.points for q in polygon_shapes ] from shapely.geometry import Polygon polygons = [Polygon(q) for q in polygon_points] #Load the shapefile of points and convert it to shapely point objects points_sf = shapefile.Reader("C:/PointShapeFile.shp") point_shapes = points_sf.shapes() from shapely.geometry import Point point_coords= [q.points[0] for q in point_shapes ] points = [Point(q.points[0]) for q in point_shapes ] #Build a spatial index based on the bounding boxes of the polygons from rtree import index idx = index.Index() count = -1 for q in polygon_shapes: count +=1 idx.insert(count, q.bbox) #Assign one or more matching polygons to each point matches = [] for i in range(len(points)): #Iterate through each point temp= None print "Point ", i #Iterate only through the bounding boxes which contain the point for j in idx.intersection( point_coords[i]): #Verify that point is within the polygon itself not just the bounding box if points[i].within(polygons[j]): print "Match found! ",j temp=j break matches.append(temp) #Either the first match found, or None for no matches

Hi,

First of all i would like to thank you for posting this usefull piece of code. Unfortunately for large amount of points (my sample have over 10 milion points) this code will eat huge amount of memory (which can cause an MemoryError) and will be very slow, becouse of iterating few times over the same points data. My proposition is to use generator expresions when you are loading and iterating through the points.

Code example:

points_sf = shapefile.Reader("C:/PointShapeFile.shp")

# i replaced '.shapes()' by '.iterShapes()' method to avoid loading all points into memory

point_shapes = points_sf.iterShapes()

from shapely.geometry import Point

# generator expresion which can be used in 'for' loop. I also removed point_coords list, becouse shapely Point class have attributes: 'x', 'y' and 'coords' (which is (x, y) tuple).

points = (Point(q.points[0]) for q in point_shapes)

Best regards,

DamnBack

Wow, Thanks a lot! Your code just made my spatial join process run like 15 times faster than another solution I was trying!

[…] http://rexdouglass.com/fast-spatial-joins-in-python-with-a-spatial-index/ […]

Is there a way to map this spatial join?

[…] a shout out to Rex Douglass and this blog post, I’ve adapted most of the python code here from that example. Also before we get started, it […]