In a follow up to my post about generating lots of real world random data in R, in this brief post I show how to generate lots of realistic functions. By sampling from the PDF and CDF of real world data you can quickly generate all manner of continuous and step functions for further experimentation.

The first step is to generate a data frame, **data_generated**, that you can sample from. If you follow the instructions from the post above, you can generate an arbitrarily large sample of real world variables from all of the popular R packages.

The second step is to select a variable at random and to estimate its probability density function (PDF). If you use a Gaussian approximation, this will be a smooth function, and if you use a rectangular one this will be a step function. Next estimate the cumulative distribution function (CDF) which will be weakly monotonic. Multiply the two and you have access to a wide range of very plausible real world functional forms. As a final step select a smaller window, capturing only a small part of the global shape and weakening any empirical regularity that might exist variable to variable.

n=4000 #Length of the data set before PDF k=160 #Number of desired series bins=1000 #Bins for density function outcomes <- as.data.frame(cbind(x=1:1000)) #place holder for x axis for(i in 1:k){ try({ k=sample.int(ncol(data_generated), size = 1) #Pick a column to sample kernel = c("gaussian", "rectangular")[sample.int(2, size = 1)] #Continuous or step wise smoothing pdf<- density(as.numeric(data_generated[1:n,k]), kernel = kernel, na.rm=T,n=bins) #Take the PDF cdf <- cumsum(pdf$y) #Take the CDF flip = sample(x=c(-1,1), size=1) #Should it be flipped vertically? combined <- pdf$y * cdf * flip #combine the two and possibly flip a <- sample.int(n-bins, size = 1) b <- a+bins-1 final <- scale(combined[a:b]) #Take only a partial window outcomes[,i+1] <- final #Add it as a column }) }

The final product will be a data frame with a column X for the x axis and many plausible functions for the rest of the columns. Using the code below, we can visualize each function.

library(ggplot2) require(reshape) df <- melt(outcomes , id = 'x', variable_name = 'series') #melt it for faceted plotting outcome <- ggplot(df, aes(x=x,y=value, color=series)) + geom_point(size=.5) + facet_wrap( ~ series, nrow = 10, scales="free_y") + theme_bw() + theme(strip.background = element_blank(), strip.text = element_blank()) + theme(axis.line=element_blank(), axis.text.x=element_blank(), axis.text.y=element_blank(), axis.ticks=element_blank(),axis.title.x=element_blank(), axis.title.y=element_blank(),legend.position="none", panel.background=element_blank(), panel.border=element_blank(),panel.grid.major=element_blank(), panel.grid.minor=element_blank(),plot.background=element_blank())

The functions produced look very plausible. They include linear, quadratic, and polynomial functions. They includes spike and step functions as well as a couple of cyclical ones. They also includes some more bizarre alternating ones that switch between one function and another at unpredictable intervals. A few even look self similar, like stock prices over time.

If you find these really pretty and want to see a few hundred more click below see more examples

Show a few hundred more examples!

## Leave a Reply