The binomial distribution converges to the poisson in the limit when the probability of success equals lambda and the number of trials goes to infinity.
The canonical example of Poisson process is death by horse kicks in the Prussian army collected Ladislaus Josephovich Bortkiewicz (9/1868 – 8/1931) in his book “Law of Small Numbers” (Bortkiewicz and Bortkevič 1898). (Note it had been well known by then, introduced in 1711 and previously used for estimating wrongful convictions.)
Attaching package: 'arrow'
The following object is masked from 'package:utils':
timestamp
import numpy as nptoy_vector_numeric = np.array([1,2,3,4,5])toy_vector_character = np.array(['a','b','c','d','e'])toy_list = ['a','1',True,['red','green']]toy_dictionary = { 'a':1 , 'b':2, 'c':3}from jax import numpy as jnptoy_vector_numeric_jax = jnp.array([1,2,3,4,5])#toy_vector_character_jax = jnp.array(['a','b','c','d','e']) #only numeric is allowed in jax
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
import pandas as pdtoy_df = pd.DataFrame(data={'id': ['unit1','unit2','unit3'], 'y': [1, 2, 3], 'x': [3, 2, 1]})import torchimport tensorflow as tfimport pyarrow as pa
library(DBI)# Create an ephemeral in-memory RSQLite database#con <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")#dbListTables(con)#dbWriteTable(con, "mtcars", mtcars)#dbListTables(con)#Configuration failed because libpq was not found. Try installing:#* deb: libpq-dev libssl-dev (Debian, Ubuntu, etc)#install.packages('RPostgres')#remotes::install_github("r-dbi/RPostgres")#Took forever because my file permissions were broken#pg_lsclustersrequire(RPostgres)# Connect to the default postgres database#I had to follow these instructions and create both a username and database that matched my ubuntu name#https://www.digitalocean.com/community/tutorials/how-to-install-postgresql-on-ubuntu-20-04-quickstartcon_Postgres <-dbConnect(RPostgres::Postgres())
DROPTABLEIFEXISTS toy_df;
CREATETABLEIFNOTEXISTS toy_df (idvarchar(5), y INTEGER, x INTEGER);
INSERTINTO toy_df (id, y, x)VALUES ('unit1',1,3), ('unit2',2,2), ('unit3',3,1);
myURL <-'https://raw.githubusercontent.com/SmilodonCub/MSDS2020_Bridge/master/VonBort.csv'gitVonBort <-read.csv( url( myURL ) ) # read.csv is a built in function that will read the csv data as an R dataframe.#head( gitVonBort ) # head() will by default display the first 6 rows of the dataframehead( gitVonBort, 4 ) # the second argument customizes the number of lines made visible
deaths year corps fisher
1 0 1875 G no
2 0 1875 I no
3 0 1875 II yes
4 0 1875 III yes
deaths: number (int) of deaths in a year year: year of the data entry given as a number (int) corps: a factor indicating which corps the data entry corresponds to. The forth feature, ‘fisher’, is less intuitive; it is a factor that indicates whether the corps was included in the analysis performed by R.A. Fisher in 1925. Borkiewicz qualitatively established the Poisson distribution to his data, however Fisher was the first to quantitatively demonstrate the goodness of fit of the Poisson probability model to the horse-kicking via the chi-squared test. (Merikoski 2017). The data that was excluded from the analysis because it was considered to be from heterogeneous corps. For instance, corps ‘G’ was an elite calvalry corps. For this analysis, we will use the same subset of the horse-kicking data that Fisher used. That is to say, that will be using the data entries where ‘fisher’ = ‘yes’
gitVonBort %>%pull(deaths) %>%hist()
gitVonBort %>%pull(deaths) %>%mean()
[1] 0.7
Tidyverse
DataTable
Arrow
0.2 Python
0.2.0.1 3.x / math/ statistics
0.2.0.2 NumPy / SciPy / scikit-learn
0.2.0.3 Pandas
0.3 Jax
0.4 Numpyro
0.5 Stan
0.6 Torch
0.7 Tensorflow
References
Bortkiewicz, Ladislaus von, and Vladislav I. Bortkevič. 1898. Das Gesetz der kleinen Zahlen. B.G. Teubner.