Developing GIS products for older historical periods can be tricky. The further back in time you go, the less comprehensive the map coverage, the less accurate the surveying, and the more obscure the projections. In a project with Kristen Harkness and others, we are recovering and mapping events during an insurgency that took place 60 years ago in Colonial Kenya called the Mau Mau Rebellion. The episode is surprisingly data rich and relevant for modern counterinsurgency debates, but making sense of such an old case with modern econometric tools means solving very practical GIS problems like how to develop a period accurate map of roads and infrastructure. This post details a two prong approached we developed that should be widely applicable to other cases.
Two Directions to Attack the Problem
Working forward from Historical Maps
The most detailed map we could find of 1950s Kenya road system was an atlas prepared by the Road Engineers Office Public Works Department in Nairobi, Februrary 1951, “Colony and Protectorate of Kenya Key Plan to Road Maps of the Colony Prepared to Scale 1:500,000.” We located and digitized a copy at the National Archives of the United Kingdom at Kew.
Unfortunately, all of the details of individual roads were spread out across different sheets of the atlas. Worse still, digitization of large form materials at the archives is unnecessarily complicated and expensive, so we ended up taking many smaller high resolution photos of each map sheet. We then combined all of the individual images together with Microsoft’s Image Composite Editor (ICE) which performs a process called image stitching. If you’re interested in the underlying mechanics, a stitching algorithm works by identifying control points through something like SIFT features and then finds affine transformations to line up two or more images on those points with the minimal squared error.
A few of the reconstructed map sheets are shown below (notice the interesting effect of my hand appearing in multiple places in the same image).
Once stitching is complete, you have a single image covering the entire country. That image then needs to be georeferenced, lining up and warping so that it better fits the real world geography. That step is done by hand in QGIS with georeferencer and simply required picking some control points on the image and their corresponding points on google satellite images.
Labeling and Supervised Classification
Rather than hand trace each and every road, I use a supervised learning approach. I label some pixels as primary roads (white), secondary roads (purple) and background (red). These manual classifications surve as labels and a sliding 11×11 pixel window provides the features for each pixel. I use a random forest as the classifier, assigning every pixel in the image to the category with highest predicted probability. The full road network includes 90% more pixels than were classified by hand, shown in green against a black background below.
Comparing the results it to satellite images revealed several weaknesses in this approach:
- It requires further cleaning, removing false positives and filling in gaps from false negatives
- Stitching and georeferencing introduced systematic errors for some parts of the country.
- The resolution and accuracy of the atlas are underwhelming. Lots of small roads are documented, but their length and shape vary wildly from the reality on the ground.
Working backwards from a Modern GIS Product
Alternatively, there are high quality vector shapefiles of roads in Kenya based on map sheets dated between 1978-1997. The International Livestock Research Institute serves a number of these maps free of charge.
Compared to satellite imagery, these shapefiles were both accurate and comprehensive. The catch is that they include many roads that were built or extended long after our time period of interest. Our solution is to overlay the modern shapefile over the georeferenced contemporary atlas image constructed above and to delete segments which don’t appear in both.
The final product is a pruned modern shapefile containing all of the roads that existed in the 1950s, a few false positives that had been built since but not deleted by hand, and missing roads that might have existed but that have been moved or destroyed. A kml version on google maps is provided below.