Statistics Literature

Coefficient of variation - In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD),[citation needed] is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation {}to the mean {}(or its absolute value, {||}||)

Index of dispersion - In probability theory and statistics, the index of dispersion,[1] dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the coefficient of variation, is a normalized measure of the dispersion of a probability distribution: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.

It is defined as the ratio of the variance {{2}}{2} to the mean {},

{D={^{2} }.}D = {^2 }.

0.1 (Fisher 1925)

Fisher, R. A. “Statistical Methods for Research Workers, 13e.” London: Oliver and Loyd, Ltd, 1925, 99–101. Wikipedia


Statistics may be regarded as (i.) the study of populations, (ii.) as the study of variation, (iii.) as the study of methods of the reduction of data.


The idea of a population is to be applied not only [p. 3] to living, or even material, individuals. If an observation, such as a simple measurement, be repeated a number of times, the aggregate of the results is a population of measurements.


The conception of statistics as the study of variation is the natural outcome of viewing the subject as the study of populations; for a population of individuals in all respects identical is completely described by a description of any one individual, together with the number in the group, The populations which are the object of statistical study always display variation in one or more respects.

Frequency distribution

Frequency distributions are of various kinds, according as the number of classes in which the population is distributed is finite or infinite, and also according as the intervals which separate the classes are finite or infinitesimal.

Correlation / Covariation

Especially important is the study of the simultaneous variation of two or more variates. This study, arising principally out of the work of Galton and Pearson, is generally known in English under the name of Correlation, but by some continental writers as Covariation.

parameters / statistics

parameters are the characters of the population. If we could know the exact specification of the population, we should know all (and more than) any sample from [p. 8] the population could tell us. We cannot in fact know the specification exactly, but we can make estimates of the unknown parameters, which will be more or less inexact. These estimates, which are termed statistics, are of course calculated from the observations. If we can find a mathematical form for the population which adequately represents the data, and then calculate from the data the best possible estimates of the required parameters, then it would seem that there is little, or nothing, more that the data can tell us; we shall have extracted from it all the available relevant information

The three problems of reduction of data:


(i.) Problems of Specification, which arise in the choice of the mathematical form of the population. [p. 9]


(ii.) Problems of Estimation, which involve the choice of method of calculating, from our sample, statistics fit to estimate the unknown parameters of the population.


(iii.) Problems of Distribution, which include the mathematical deduction of the exact nature of the distribution in random samples of our estimates of the parameters, and of other statistics designed to test the validity of our specification (tests of Goodness of Fit).


Theory of Probability. For a given population we may calculate the probability with which any given sample will occur, and if we can solve the purely mathematical problem presented, we can calculate the probability of occurrence of any given statistic calculated from such a sample.


Argues that probability can’t tell you what the population looks like from a sample, or what people sometimes called inverse probability. The process/math required is different and so he prefers to cal it likelihood.

For many years, extending over a century and a half, attempts were made to extend the domain of the idea of probability to the deduction of inferences respecting populations from assumptions (or observations) respecting samples. Such inferences are usually distinguished under the heading of Inverse Probability, and have at times gained wide acceptance. This is not the place to enter into the subtleties of a prolonged controversy; it will be sufficient in this general outline of the scope of Statistical Science to express my personal conviction, which I have sustained elsewhere, that the theory of inverse probability is founded upon an error, and must be wholly rejected. Inferences respecting populations, from which known samples have been drawn, cannot be expressed in terms of probability, except in the trivial case when the population is itself a sample of a super-population the specification of which is known with accuracy. This is not to say that we cannot draw, from knowledge of a sample, inferences respecting the population from which the sample was drawn, but that the mathematical concept of probability is inadequate to express our mental confidence or diffidence in making such inferences, and that the mathematical quantity which appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. [p. 11] To distinguish it from probability, I have used the term “Likelihood” to designate this quantity; since both the words “likelihood” and “probability” are loosely used in common speech to cover both kinds of relationship.

Consistency - Errors tend to converge toward normally distributed as n goes to infinity.

Consistent statistics, on the other hand, all tend more and more nearly to give the correct values, as the sample is more and more increased; at any rate, if they tend to any fixed value it is not to an incorrect one. In the simplest cases, with which we shall be concerned, they not only tend to give the correct value, but the errors, for samples of a given size, tend to be distributed in a well-known distribution (of which more in Chap. III.) known as the. Normal Law of Frequency of Error, or more simply as the normal distribution. The liability to error may, in such cases, be expressed by calculating the mean value of the squares of these errors, a value which is known as the variance; and in the class of cases with which we are concerned, the variance falls off with increasing samples, in inverse proportion to the number in the sample.

Efficiency - Consistent and with small variance for a given N

MLE tends to produce efficient statistics

Consequently a special importance belongs to a smaller group of statistics, the error distributions of which tend to the normal distribution, as the sample is increased, with the least possible variance. We may thus separate off from the general body of consistent statistics a group of especial value, and these are known as efficient statistics.


There is, however, one class of statistics, including some of the most frequently recurring examples, which is of theoretical interest for possessing the remarkable property that, even in small samples, a statistic of this class alone includes the whole of the relevant information which the observations contain. Such statistics are distinguished by the term sufficient, and, in the use of small samples, sufficient statistics, when they exist, are definitely superior to other efficient statistics. Examples of sufficient statistics are the arithmetic mean of samples from the normal distribution, or from the Poisson Series; it is the fact of providing sufficient statistics for these two important types of distribution which gives to the arithmetic mean its theoretical importance. The method of maximum likelihood leads to these sufficient statistics where they exist.


A statistic is a value calculated from an observed sample with a view to characterising the population [p. 44] from which it is drawn. For example, the mean of a number of observations … Such statistics are of course variable from sample to sample, and the idea of a frequency distribution is applied with especial value to the variation of such statistics.

total frequency, or probability integral

In practical applications we do not so often want to know the frequency at any distance from the centre as the total frequency beyond that distance; this is represented by the area of the tail of the curve cut off at any point.

P>0.05 is about 2*standard error rule of thumb

The value of the deviation beyond which half the observations lie is called the quartile distance, and bears to the standard deviation the ratio .67449.·It is therefore a common practice to calculate the standard error and then, multiplying it by this factor, to obtain the probable error. The probable error is thus about two-thirds of the standard error, and as a test of significance a deviation of three times [p. 48] the probable error is effectively equivalent to one of twice the standard error. The common use of the probable error is its only recommendation ; when any critical test is required the deviation must be expressed in terms of the standard error in using the probability integral table.

The first two moments of the normal distribution are mean and variance. They are also sufficient statistics to fully describe a normal distribution, e.g. you don’t need skew or higher because its symmetric.

You can test for departure from normality by calculating higher moments and seeing if they’re far from zero.


Fisher, R. A. 1925. “Statistical Methods for Research Workers, 13e.” London: Oliver and Loyd, Ltd, 99–101.