Data from Historical Maps: Extracting Backgrounds

FacebookTwitterGoogle+RedditLinkedIn

medium (1)As I’ve written on here before, digitizing political maps is no easy task. One tough problem is digitizing background colors which identify things like land cover.┬áConsider this section from a Vietnam War era military map of South Vietnam. There are three background regions, a dark green for forested area, a slightly lighter green for cleared forest, and a white area for completely clear. On top of that are lots of details including brown elevation lines, black grid lines and text, etc.

How do we differentiate the background regions from the foreground regions? Define a background region as a semi-contiguous area with similar but not identical color.

Consider the following algorithm:
1) Use principal components analysis to cluster colors in the image.
2) Count the number of pixels belonging to each cluster.
3) Iterate through each cluster separately
a) Set all non-cluster pixels to black
b) Perform a median (or modal) filter
4) Count the number of pixels in each cluster which survived the filter
5) Calculate the percentage change from the original count to the new count.
6) Keep any cluster with a percentage change above an arbitrary cutoff point.
7) Create a mask for all pixels belonging to clusters which we rejected
8) Interpolate under the mask with those belonging to the clusters we kept.

The final product will be an image segmented solely into the main background regions as defined by the process above.

medium (2)

The process also allows us to extract the foreground which we can then further segment apart from the background.

medium (3)

Here is a draft of a function which implements the algorithm in python using the opencv library.

#Function to split background from foreground in map images
#Rex W. Douglass, 6/12/2013
#rexdouglass.com

#Imports
import numpy as np
import cv2

#########################################
#Function Define: forebacksplit
#Accepts a 3 channel image as a numpy array
#clusters - number of colors to posterize image into
#threshold - cutoff for how susceptible a cluster should be a median average with it's neighbors. Higher cutoff means fewer candidates for background regions.
#Returns a background image with holes interpolated, and a foreground region with holes set to black.
def forebacksplit(image, clusters=32, threshold=30):
        original=image.copy()

        #Cluster Colors
        Z = original.reshape((-1,3)) #reshape into one big vector
        Z = np.float32(Z) # convert to np.float32
        # define criteria, number of clusters(K) and apply kmeans()
        criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 10)
        ret,label,center = cv2.kmeans(Z,clusters,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
        center = np.uint8(center) # Now convert back into uint8, and make original image
        res = center[label.flatten()]
        original_clustered = res.reshape((original.shape))

        #Determine level of contiguity of each cluster
        percentchange= list()
        for color in center:
            condition = original_clustered !=color
            temp = original_clustered.copy()
            temp[condition]=0
            median= cv2.medianBlur(temp, 11)
            condition2 = median !=color
            sumoriginal=  sum(condition==False)
            summedian=  sum(condition2==False)
            percentdiff= round ((summedian/sumoriginal)*100 )
            print (sumoriginal,summedian,percentdiff)
            percentchange.append(percentdiff)

        #Select clusters above an arbitrary threshold
        percentchange = np.asarray(percentchange)
        surviving_clusters = center[percentchange>threshold]

        #Split image based on surviving clusters
        #create string representations of the two for matching
        original_clustered_string = original_clustered.ravel().view( (np.str, original_clustered.itemsize*3) )
        surviving_clusters_string = surviving_clusters.ravel().view((np.str, surviving_clusters.itemsize*3))

        mask_logical= np.in1d(original_clustered_string,surviving_clusters_string)

        background2d = mask_logical.reshape(original_clustered.shape[0:2]) #convert to a 2d mask, true for background
        foreground2d = logical_not(background2d)  #invert for just background, true for forground

        foreground = original_clustered.copy()
        foreground[background2d,:]=0

        background = original_clustered.copy()
        background[foreground2d,:]=0        

        #Now fill in background as solid
        mask= array(foreground2d*255, dtype=uint8) #foreground mask as an image        
        original_clustered_inpaint = cv2.inpaint(original_clustered, mask, 5, cv2.INPAINT_TELEA  ) #slow
        background=original_clustered_inpaint

        #Optional last median pass to despeckle
        #background = cv2.medianBlur(original_clustered_inpaint, 9)

        return(background,foreground) #returns a full background image and a full foreground image

################################
#Begin main code       
original_bgr = cv2.imread('sample2.png',  1 ) #varied sample map
background,foreground = forebacksplit(original_bgr,64,70) #split with 64 colors and a threshold of 70

cv2.imshow('original',original_bgr)
cv2.imshow('background',background)
cv2.imshow('foreground',foreground)
FacebookTwitterGoogle+RedditLinkedIn

3 comments to Data from Historical Maps: Extracting Backgrounds

  • Christ

    Hi,

    First of all, thanks for this function to split backgrounds.
    May be I am missing something, but it looks like in "foreground2d = logical_not(background2d)", "logical_not" isn't defined.
    To fix the error I define "logical_not= np.in1d(original_clustered_string,surviving_clusters_string,invert=True)", is that correct?

    Thanks

  • Christ

    My last question was due to the fact that I still have the following error (I am using the same image as the above one). I understand the error but I don't see where it comes from. So I thought there may be something wrong with the definition of "logical_not". (I have fixed the errors with "np.sum" and "np.round").

    OpenCV Error: Sizes of input arguments do not match (All the input and output images must have the same size) in cvInpaint, file /build/buildd/opencv-2.4.8+dfsg1/modules/photo/src/inpaint.cpp, line 738
    Traceback (most recent call last):
    File "testBG.py", line 77, in
    background,foreground = forebacksplit(original_bgr,64,70) #split with 64 colors and a threshold of 70
    File "testBG.py", line 66, in forebacksplit
    original_clustered_inpaint = cv2.inpaint(original_clustered, mask, 5, cv2.INPAINT_TELEA) #slow
    cv2.error: /build/buildd/opencv-2.4.8+dfsg1/modules/photo/src/inpaint.cpp:738: error: (-209) All the input and output images must have the same size in function cvInpaint

    I would greatly appreciate any help to fix it.
    Thanks

  • Christ

    Sorry to bother you!
    I've fixed the errors (it took me more time as am new in opencv)
    I only need to make the following changes:
    sumoriginal= np.sum(condition==False)
    summedian= np.sum(condition2==False)
    percentdiff= np.round((summedian/sumoriginal)*100)
    foreground2d = np.logical_not(background2d)
    mask= np.array(foreground2d*255, dtype=np.uint8)

    Best
    P.S. please delete my first comments.

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>