# Data from Historical Maps: Extracting Backgrounds

As I’ve written on here before, digitizing political maps is no easy task. One tough problem is digitizing background colors which identify things like land cover. Consider this section from a Vietnam War era military map of South Vietnam. There are three background regions, a dark green for forested area, a slightly lighter green for cleared forest, and a white area for completely clear. On top of that are lots of details including brown elevation lines, black grid lines and text, etc.

How do we differentiate the background regions from the foreground regions? Define a background region as a semi-contiguous area with similar but not identical color.

Consider the following algorithm:
1) Use principal components analysis to cluster colors in the image.
2) Count the number of pixels belonging to each cluster.
3) Iterate through each cluster separately
a) Set all non-cluster pixels to black
b) Perform a median (or modal) filter
4) Count the number of pixels in each cluster which survived the filter
5) Calculate the percentage change from the original count to the new count.
6) Keep any cluster with a percentage change above an arbitrary cutoff point.
7) Create a mask for all pixels belonging to clusters which we rejected
8) Interpolate under the mask with those belonging to the clusters we kept.

The final product will be an image segmented solely into the main background regions as defined by the process above.

The process also allows us to extract the foreground which we can then further segment apart from the background.

Here is a draft of a function which implements the algorithm in python using the opencv library.

```#Function to split background from foreground in map images
#Rex W. Douglass, 6/12/2013
#rexdouglass.com

#Imports
import numpy as np
import cv2

#########################################
#Function Define: forebacksplit
#Accepts a 3 channel image as a numpy array
#clusters - number of colors to posterize image into
#threshold - cutoff for how susceptible a cluster should be a median average with it's neighbors. Higher cutoff means fewer candidates for background regions.
#Returns a background image with holes interpolated, and a foreground region with holes set to black.
def forebacksplit(image, clusters=32, threshold=30):
original=image.copy()

#Cluster Colors
Z = original.reshape((-1,3)) #reshape into one big vector
Z = np.float32(Z) # convert to np.float32
# define criteria, number of clusters(K) and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 10)
ret,label,center = cv2.kmeans(Z,clusters,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center) # Now convert back into uint8, and make original image
res = center[label.flatten()]
original_clustered = res.reshape((original.shape))

#Determine level of contiguity of each cluster
percentchange= list()
for color in center:
condition = original_clustered !=color
temp = original_clustered.copy()
temp[condition]=0
median= cv2.medianBlur(temp, 11)
condition2 = median !=color
sumoriginal=  sum(condition==False)
summedian=  sum(condition2==False)
percentdiff= round ((summedian/sumoriginal)*100 )
print (sumoriginal,summedian,percentdiff)
percentchange.append(percentdiff)

#Select clusters above an arbitrary threshold
percentchange = np.asarray(percentchange)
surviving_clusters = center[percentchange>threshold]

#Split image based on surviving clusters
#create string representations of the two for matching
original_clustered_string = original_clustered.ravel().view( (np.str, original_clustered.itemsize*3) )
surviving_clusters_string = surviving_clusters.ravel().view((np.str, surviving_clusters.itemsize*3))

foreground2d = logical_not(background2d)  #invert for just background, true for forground

foreground = original_clustered.copy()
foreground[background2d,:]=0

background = original_clustered.copy()
background[foreground2d,:]=0

#Now fill in background as solid
original_clustered_inpaint = cv2.inpaint(original_clustered, mask, 5, cv2.INPAINT_TELEA  ) #slow
background=original_clustered_inpaint

#Optional last median pass to despeckle
#background = cv2.medianBlur(original_clustered_inpaint, 9)

return(background,foreground) #returns a full background image and a full foreground image

################################
#Begin main code
original_bgr = cv2.imread('sample2.png',  1 ) #varied sample map
background,foreground = forebacksplit(original_bgr,64,70) #split with 64 colors and a threshold of 70

cv2.imshow('original',original_bgr)
cv2.imshow('background',background)
cv2.imshow('foreground',foreground)```

### 3 comments to Data from Historical Maps: Extracting Backgrounds

• Christ

Hi,

First of all, thanks for this function to split backgrounds.
May be I am missing something, but it looks like in "foreground2d = logical_not(background2d)", "logical_not" isn't defined.
To fix the error I define "logical_not= np.in1d(original_clustered_string,surviving_clusters_string,invert=True)", is that correct?

Thanks

• Christ

My last question was due to the fact that I still have the following error (I am using the same image as the above one). I understand the error but I don't see where it comes from. So I thought there may be something wrong with the definition of "logical_not". (I have fixed the errors with "np.sum" and "np.round").

OpenCV Error: Sizes of input arguments do not match (All the input and output images must have the same size) in cvInpaint, file /build/buildd/opencv-2.4.8+dfsg1/modules/photo/src/inpaint.cpp, line 738
Traceback (most recent call last):
File "testBG.py", line 77, in
background,foreground = forebacksplit(original_bgr,64,70) #split with 64 colors and a threshold of 70
File "testBG.py", line 66, in forebacksplit
original_clustered_inpaint = cv2.inpaint(original_clustered, mask, 5, cv2.INPAINT_TELEA) #slow
cv2.error: /build/buildd/opencv-2.4.8+dfsg1/modules/photo/src/inpaint.cpp:738: error: (-209) All the input and output images must have the same size in function cvInpaint

I would greatly appreciate any help to fix it.
Thanks

• Christ

Sorry to bother you!
I've fixed the errors (it took me more time as am new in opencv)
I only need to make the following changes:
sumoriginal= np.sum(condition==False)
summedian= np.sum(condition2==False)
percentdiff= np.round((summedian/sumoriginal)*100)
foreground2d = np.logical_not(background2d)