# 10.4.2. Dispersion in statistical models

For a binomial distribution, the variance *np(1-p)* depends on the mean *np*. When the variance in the
observations is bigger or smaller than the expected variance, data are said to
show over- or under-dispersion. Both types of dispersion are indicated by the
goodness-of-fit tests of fitted models by the ratio of the residual deviance of
the fitted model to the number of degrees of freedom, values appreciably larger
than 1 indicating over-dispersion and values lower than1 indicating
under-dispersion. Both types can strongly affect and invalidate model
hypothesis testing (standard errors, confidence intervals and p-values). See
Twisk (2010), Zuur *et al.* (2009),
Hardin and Hilbe (2007) and Myers *et al.*
(2002) for examples. Causes of under- or over-dispersion can be related to the
frequency characteristics of the data, with relatively small and large
beekeepers/operations present in different numbers (heterogeneity of the sample
population). An important assumption of a binomial distribution, namely
independence of observations (independent Bernoulli trials), might be violated
when losses are not independent (are clustered) through an unknown factor (i.e.
effects of a certain location, incidence of pathogens) that cannot be used
(properly) in the model.

When under- or over-dispersion
are not reduced after using the most significant model factors derived from the
data and/or stratifying available data according to binomial trial size, the
solution is using a different distribution for the dependent variable. A
suitable candidate is the quasi-binomial distribution, in which variance is
characterised by adding an additional parameter to the binomial distribution,
and hypothesis testing can be corrected for the extra-binomial variance. The
form of the quasi-binomial probability distribution is:

See the manual available online by Kindt and Coe (2005) for an excellent example of the use of a quasi-binomial distribution and its differences compared to the standard binomial distribution. An excess of zero values (no loss) can be a cause of over-dispersion. To investigate the relation between predictor variables and the presence of zero values (no loss), zero-inflation techniques can be used (for example, Hall (2000)).