About Me

I’m a computational social scientist, and director of the Machine Learning for Social Science Lab (MSSL). I’m based at the Center for Peace and Security Studies (cPASS), Department of Political Science, University of California San Diego (UCSD). Previously I’ve held post-doc positions in Political Science (UCSD), Mathematics (UCSD). and History (Columbia). Before that I studied at Princeton (MA/PHD), and the University of Texas (BA).

[Curriculum vitae] , [Github] , [Twitter Feed], Contact (rwdouglass at ucsd.edu)

Research

I work on big, dirty, unstructured, observational data. Examples include, cell phone calls, military intelligence, text from scientific articles, knowledge bases like wikidata, newspaper reports, and raw natural images. Here below are some example projects that have emerged from that work.


COVID-19


The Data Science of COVID-19 Spread: Some Troubling Current and Future Trends (with Thomas Scherer, Erik Gartzke), Peace Economics, Peace Science and Public Policy, August 17, 2020

[Paper] [Paper Open Access] [Media: Wired] [Media: Slate] [Media: king5]


How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’ (Epstein 2020), March 30, 2020

[Paper] [Media: LA Times]


Crowd-sourced COVID-19 Dataset Tracking Involuntary Government Restrictions (TIGR), March 2020,

[Github]


Computational Replications and Reviews

Substantial underestimation of SARS-CoV-2 infection in the United States’ (We et al. 2020)]

I provide semi-regular review of COVID-19 papers and literature reviews on Twitter (30 and counting) in this [thread].


International Security


“Introducing the ICBe Dataset: Very High Recall and Precision Event Extraction from Narratives about International Crises.” (with Thomas Leo Scherer, J. Andrés Gannon, Erik Gartzke, Jon Lindsay, Shannon Carcelli, Jonathan Wiklenfeld, David M. Quinn, Catherine Aiken, Jose Miguel Cabezas Navarro, Neil Lund, Egle Murauskaite, and Diana Partridge). 2022. arXiv:2202.07081.

[Github], [Pre-Print]

We introduce the first high coverage, high recall, AND high precision international conflict database. International Crisis Behaviors Events (ICBe) is a human coded event dataset of 10k+ international events, covering 117+ behaviors, across 475 international Crises (1918-2015).


crisisevents.org (with Thomas Leo Scherer)

A web portal for visualizing and comparing event data. We introduce a new visualization called a crisis-map, which is a combination of a timeline and a directed network graph for displaying complex interactions between actors over time.


“Measuring the Landscape of Civil War” (with Kristen Harkness) Journal of Peace Research, February 15, 2018

[Github], [Ungated Paper], [Ungated Appendix], [Gated Paper]

We show which natural language geo-referencing strategy you choose determines what downstream econometric result you’ll find. We develop a dataset of ten thousand events from the Mau Mau rebellion, drawn from twenty thousand pages of historical intelligence documents. We apply over a dozen geo-referencing strategies, and benchmark them against a known ground-truth in the form of exact military grid coordinates which were available for a subset of the reports.


“Understanding Civil War Violence through Military Intelligence: Mining Civilian Targeting Records from the Vietnam War” Chapter in C.A. Anderton and J. Brauer, eds., Economic Aspects of Genocides, Mass Atrocities, and Their Prevention. New York: Oxford University Press, 2016

[Ungated arXiv preprint]

I investigate a contemporary government database of civilians targeted during the Vietnam War. The data are detailed, with up to 45 attributes recorded for 73,712 individual civilian suspects. I employ an unsupervised machine learning approach of cleaning, variable selection, dimensionality reduction, and clustering. I find support for a simplifying typology of civilian targeting that distinguishes different kinds of suspects and different kinds targeting methods.


“Why Not Divide and Conquer? Targeted Bargaining and Violence in Civil War” Dissertation, 2012, Princeton University, Department of Politics

[Ungated Dissertation Print]


“MINING THE GAPS: A Text Mining-Based Meta-Analysis of the Current State of Research on Violent Extremism” (with Candace Rondeaux)

[Ungated PDF]

We apply topic modeling to a unique corpus of 3,000 expert curated articles on violent extremism


Demography


“Analyzing Social Divisions using Cell Phone Data” (with Orest Bucicovschi, Rex W. Douglass, David A. Meyer, Ram Rideout, Dongjin Song)

[Ungated Conference Preprint]

Awarded Best Scientific Paper in the Data for Development (D4D) competition at NetMob 2013, MIT, Cambridge, MA (1-3, May 2013); Conference Preprint


“High resolution population estimates from telecommunications data” with (David A Meyer, Megha Ram, David Rideout, and Dongjin Song) EPJ Data Science 2015, 4:4

[Ungated Paper at EPJ]

Top 10 finalist of 652 projects in Telecom Italia Big Data Challenge 2014.

Population censuses are expensive and infrequent, but cell phone data are plentiful and real-time; how can we use one to estimate the other? We investigate the relationship between calling activity and demography at a very high 235 square meter resolution in Northern Italy.


Teaching

I teach a brief course on Machine Learning for new members of my lab and elsewhere by invitation.

[Github]