Some Derivations and Criticisms

Simpson Index

ipi2 \sum_i{p_i^2} with pip_i = ni/inin_i/\sum_i{n_i} and nin_i is the abundance of species ii

derivation: the probability to choose a specific species twice is pi*pip_i*p_i and thus the probability that any two chosen specimens belong both to any species ii is the sum over all species ii.

criticism: almost none, it is easy to understand and, as a true probability, bounded between 0 and 1. Of course we shouldn't interprete the "choosing" of a specimen as catching one (this has definitely a different probability), but rather say that the sample is representative for the population, what is still questionable but difficult to avoid.

E(S)=i(1-(N-Nin)/(Nn))E(S) = \sum_i({1-\begin{pmatrix} N- N_i \\ n\end{pmatrix} / \begin{pmatrix} N \\ n \end{pmatrix}}) where NiN_i is the number of specimens for species ii in a population, NN is the total number of specimens and nn the size of the sample (in specimens)

Nderivation: the expectation value for the number of species to choose from an abundance vector is the sum over the probabilities of all species i in the population, which in turn is one minus the probability to miss the species i. The probability to miss a species i is equal to the number of possible combinations in the sample without that species (N-Nin)\begin{pmatrix} N - N_i \\ n \end{pmatrix} divided by the number of possible combinations with that species (Nn)\begin{pmatrix} N \\ n \end{pmatrix}.
criticism: we have to assume both, that the abundance distribution of our measured sample is representative for the distribution in the population and that choosing a specimen from the population is a Laplace experiment (all elemenary probabilities are equal). Both assumptions don't hold, and so applications of a proper rarefaction require that we know how close we are to saturation in advance, and thus are circular. What is specially dangerous, is that every rarefaction curve shows some sort of convergence, like in this study. And ceterum censeo, I don't see how Monte Carlo methods like Jackknifing make sense, if we have an analytical formula.


entropy=-1*ipi*ln(pi)entropy = -1*\sum_i{p_i*\ln\left({p_i}\right)}

derivation: coming soon

: the usually given formula repesents the entropy of an ideal mixture, like, e.g. noble gases, which behave like an ideal gas pV=nRTpV = nRT. It is just the ideal gas constant RR which is missing. Whith interacting species, like polar molecules, this formula doesn't make much sense and if we deal with a chemical reaction, the entropy of mixture has usually a negligible contribution. And flies do react with each other... (The same arguments hold if we choose Boltzmann's microstates for a derivation, or if we rename the term entropy to the thereby defined "information content".)

entropy/max(entropy)entropy / max(entropy) = ipi*ln(pi)/P*ln(P)\sum_i{p_i*\ln\left({p_i}\right)} / P*\ln\left({P}\right) where pip_i is again the probability to choose species ii in the population and PP is the probability to choose species ii if all species where equally abundant

derivation: none, once entropy is defined

criticism: even if the concept of entropy in ecology is questionable, the evenness , as a number between 0 and 1, gives a feeling how close a distribution is to the case where all species are equally abundant.

Sestimated=Sobs+f12/(2*f2)S_{expected} = S_{obs} + f_1^2/(2*f_2) where f1f_1 is the number of singletons in a sample (the number of species caught only once) and f2f_2 the number of doubletons

derivation: Chao, A. (1984) Non-parametric estimation of the number of classes in a population. Scandinavian Journal of Statistics, 11, 265–270.

criticism: not yet