10.4.2. Dispersion in statistical models

For a binomial distribution, the variance np(1-p) depends on the mean np. When the variance in the observations is bigger or smaller than the expected variance, data are said to show over- or under-dispersion. Both types of dispersion are indicated by the goodness-of-fit tests of fitted models by the ratio of the residual deviance of the fitted model to the number of degrees of freedom, values appreciably larger than 1 indicating over-dispersion and values lower than1 indicating under-dispersion. Both types can strongly affect and invalidate model hypothesis testing (standard errors, confidence intervals and p-values). See Twisk (2010), Zuur et al. (2009), Hardin and Hilbe (2007) and Myers et al. (2002) for examples. Causes of under- or over-dispersion can be related to the frequency characteristics of the data, with relatively small and large beekeepers/operations present in different numbers (heterogeneity of the sample population). An important assumption of a binomial distribution, namely independence of observations (independent Bernoulli trials), might be violated when losses are not independent (are clustered) through an unknown factor (i.e. effects of a certain location, incidence of pathogens) that cannot be used (properly) in the model.
         When under- or over-dispersion are not reduced after using the most significant model factors derived from the data and/or stratifying available data according to binomial trial size, the solution is using a different distribution for the dependent variable. A suitable candidate is the quasi-binomial distribution, in which variance is characterised by adding an additional parameter to the binomial distribution, and hypothesis testing can be corrected for the extra-binomial variance. The form of the quasi-binomial probability distribution is:


See the manual available online by Kindt and Coe (2005) for an excellent example of the use of a quasi-binomial distribution and its differences compared to the standard binomial distribution. An excess of zero values (no loss) can be a cause of over-dispersion. To investigate the relation between predictor variables and the presence of zero values (no loss), zero-inflation techniques can be used (for example, Hall (2000)).