Five Fast and Replicable Spatial Tasks with PostGIS

There is a crisis of replicability in scientific results that involve spatial data because important calculations are often carried out by hand in proprietary software. Without code to serve as a paper trail, important steps in the analysis become difficult to check for errors, alter for testing , or even to really document adequately. Performing [...]

Parsing XML to a Data Frame: Recovering the Worldwide Incidents Tracking System (WITS)

The Worldwide Incidents Tracking System (WITS) was a database of global terrorism events compiled by the National Counter-terrorism Center (NCTC) until 2012. At the end it contained 68,939 records with a short synopsis of each event and is thus still an interest to conflict scholars. Unfortunately, it’s now defunct and getting a copy can be [...]


In a follow up to my post about generating lots of real world random data in R, in this brief post I show how to generate lots of realistic functions. By sampling from the PDF and CDF of real world data you can quickly generate all manner of continuous and step functions for further experimentation.


Quickly Generating Lots of Realistic Random Data in R

In this brief post, I show a trick for quickly assembling arbitrarily large samples of real world data by sampling from all of the data sets included in R packages.


Top 10 Finalist in the Telecom Italia Big Data Challenge for "(Dis)assembling Milan with Big Data"

Congratulations to my group at UCSD (David A. Meyer, Megha Ram, David Rideout and Dongjin Song) for being selected as a top 10 finalist out of 652 teams in the Telecom Italia Big Data Challenge 2014. Check out the UCSD press release describing the project. My corner of the project focuses on using cell phone, [...]

Parsing XML files to a flat dataframe

Markup languages like XML are really handy for structured data that can have multiple values for the same attribute, or attributes which are nested within other attributes in a hierarchical structure. For simple analysis, however, we just want a rectangular data-frame with columns and rows and we need to flatten all that structure. The following [...]

R Speed Gains

For those looking to get more speed out of their R code check out this post on using C++ directly in R through the rcpppackage, and compiling R code through the new compiler package which is coming out in R 2.13.0.