Making GIS products for Historical Periods

Developing GIS products for older historical periods can be tricky. The further back in time you go, the less comprehensive the map coverage, the less accurate the surveying, and the more obscure the projections. In a project with Kristen Harkness and others, we are recovering and mapping events during an insurgency that took place 60 [...]

Quick Land Cover Estimates from Satellite Imagery and OpenStreetMap

Land cover (land use) estimates assign points or regions on the earth’s surface to classes like forested, farmland, urban development, etc. There are hundreds of land cover data sets and methods covering different regions, time periods, and special topics. In a paper under review, my coauthors and I test methods of estimating population density at [...]

Training Neural Networks on the GPU: Installation and Configuration

Neural networks are fantastic tools for classification and regression, but they are slow to train because they depend on gradient descent across thousands or even millions of parameters. In fact, they are a relatively old idea that has recently come back into vogue in part because speed increases in modern CPUs and particularly the large [...]

Fastest Random Forest- Sklearn?

I was tipped off by a github thread that the development version of the Random Forest Classifier in Sklearn (15-dev) had major speed improvements. I built a small benchmark using the MINST handwriting data set and compared the training and prediction speeds of Sklearn (14.1), Sklearn (15-dev), and and WiseRF (1.1). At least in [...]

Labeling Data for Image Segments Using Dropbox and Google Docs

For machine vision projects, I often need a quick and easy way to label images as training data. After a long false start with exporting images as cells in an excel file, I found a rather elegant online solution using dropbox to host image segments and google spreadsheets for labeling by myself or research assistants.


Extracting Data from Printed Tables in Historical Documents

A remarkable amount of data are hiding in historical records in hand written forms, electronic printouts, or typed tables. This post describes methods I use for three types of difficult documents consistently structured forms, inconsistently structured forms, and near machine readable tables.


Data from Historical Maps: Extracting Backgrounds

As I’ve written on here before, digitizing political maps is no easy task. One tough problem is digitizing background colors which identify things like land cover. Consider this section from a Vietnam War era military map of South Vietnam. There are three background regions, a dark green for forested area, a slightly lighter green for [...]