For machine vision projects, I often need a quick and easy way to label images as training data. After a long false start with exporting images as cells in an excel file, I found a rather elegant online solution using dropbox to host image segments and google spreadsheets for labeling by myself or research assistants.
Step 1: Image Files
First you’ll need image segments. In this example, I’m using scans of a 1970 U.S. Military Gazetteer of South Vietnam and Viet Cong named places and their geographic locations. Unfortunately, the print and scan quality are both rather poor, so we’re going to have to help the OCR software with lots of hand labeled examples. Below are samples of the original image with bounding boxes which I developed with a python script and the final image segment extracted and saved as individual files.
Step 2: Hosting
Next you’ll need to host the images on the net. Dropbox provides an easy and quick way through use of their public folder. Just dump the files to a folder within the public folder and copy the public link.
Step 3: Labeling Interface
Next create a google spreadsheet with three columns, an index to the image filename, a column for humans to label each image, and a cell where we’ll place the image location.
There are multiple ways to include images in a spreadsheet, but the one that works best is to include the image as a live url contained entirely within a cell. You do this with the simple formula “=image(imageurl, 3)”
The variable imageurl, is the url of each individual file and can be just the base url, and index number that matches up to the files names, and the image extension, e.g. “=image($F$1 & A2 &”.png”,3)”
The flag “3” means to keep the image’s original size which is very important for speed in a spreadsheet with hundreds of images.
Step 4: Prepare for Sharing with Others
Select every column that should not be changed by users and change the permissions to be read only except for you, by going to Data->Named and Protected Ranges.
Change the permissions on the spreadsheet to be editable by anyone with a link, and then share the link with your workers. Log out of google, and verify that you can only edit label column and nothing else.
- Be patient with dropbox, the image files are small but numerous and take a while to index and push to the cloud.
- Start with a small number of rows in the google spreadsheet and save a backup before adding hundreds or thousands of rows. The image urls are updated dynamically, and too many will slow or even crash a sheet.
- Recall that absolute cell locations are indicated with a “$” so you can include the base URL as a cell and reference it with an absolute location like “$F$1″
- Some image loads may fail with the error “#N/A” even if the image itself is available at the right URL. This is a problem on Google’s end with fetching images I believe.
- You will almost always see a warning at the top “Some formulas are taking a while to calculate.” and it is safe to dismiss.