Independent and identically distributed random variables (IID)
Instance of: Property of a set of random variables
AKA: i.i.d., iid, or IID, random sample
Distinct from:
English: Independent means the value drawn for one doesn’t influence which value will be drawn from another, or alternatively you don’t learn anything about the value of one from the value of another. Identically distributed means that the chances of a value for one variable are the same for the other, if one is drawn from a normal(0,1) then the other is drawn from a normal(0,1). Substantively it’s ruling out trends, where say the distribution’s mean or variance drifts as you take samples from it, e.g. with a time series.
Formalization:
Two random variables \(X\) and \(Y\) are identically distributed if and only if \[
F_x(x)= F_y(x) \forall x \in I
\] ::: {.column-margin} Where \(F\) is the cumulative probability distribution. :::
And independent if and only if \[
F_{x,y}(x,y)=F_x(x) * F_y(x) \forall x,y \in I
\]
Attaching package: 'arrow'
The following object is masked from 'package:utils':
timestamp
import numpy as nptoy_vector_numeric = np.array([1,2,3,4,5])toy_vector_character = np.array(['a','b','c','d','e'])toy_list = ['a','1',True,['red','green']]toy_dictionary = { 'a':1 , 'b':2, 'c':3}from jax import numpy as jnptoy_vector_numeric_jax = jnp.array([1,2,3,4,5])#toy_vector_character_jax = jnp.array(['a','b','c','d','e']) #only numeric is allowed in jax
WARNING:jax._src.lib.xla_bridge:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
import pandas as pdtoy_df = pd.DataFrame(data={'id': ['unit1','unit2','unit3'], 'y': [1, 2, 3], 'x': [3, 2, 1]})import torchimport tensorflow as tfimport pyarrow as pa
library(DBI)# Create an ephemeral in-memory RSQLite database#con <- dbConnect(RSQLite::SQLite(), dbname = ":memory:")#dbListTables(con)#dbWriteTable(con, "mtcars", mtcars)#dbListTables(con)#Configuration failed because libpq was not found. Try installing:#* deb: libpq-dev libssl-dev (Debian, Ubuntu, etc)#install.packages('RPostgres')#remotes::install_github("r-dbi/RPostgres")#Took forever because my file permissions were broken#pg_lsclustersrequire(RPostgres)# Connect to the default postgres database#I had to follow these instructions and create both a username and database that matched my ubuntu name#https://www.digitalocean.com/community/tutorials/how-to-install-postgresql-on-ubuntu-20-04-quickstartcon_Postgres <-dbConnect(RPostgres::Postgres())
DROPTABLEIFEXISTS toy_df;
CREATETABLEIFNOTEXISTS toy_df (idvarchar(5), y INTEGER, x INTEGER);
INSERTINTO toy_df (id, y, x)VALUES ('unit1',1,3), ('unit2',2,2), ('unit3',3,1);