# 6.1.2. Random sampling

Survey designs based on random sampling are designed to select sampling units from the population with known probabilities. This means that the sampling properties of estimators of population quantities can be determined, such as whether or not the estimator is unbiased (i.e., does it on average give the right answer?) and what is its precision (i.e., how do we calculate its variance or its standard error).

This is the objective scientific approach to sampling and the only one for which sampling properties of estimators are known. Other methods may provide good information but there is no guarantee that they will, and their sampling properties are unknown. However even with random sampling, if response rates are poor then the possibility of non-response bias will compromise the estimation.

Implementation of random sampling methods requires a mechanism for random selection, usually accomplished by use of random number generators in computer software, e.g. the “sample” function in the public domain software R (downloadable from http://www.r-project.org/). It also usually requires a sampling frame, or list of sampling units in the population (section 6.2).

The simplest scheme is *simple
random sampling*, which samples randomly without replacement from the sampling
frame so that at every stage every sampling unit not already selected from the
sampling frame is equally likely to be chosen. This results in all samples of a
given size being equally likely to be selected.

*Systematic sampling* is sometimes used as a simple alternative to simple random sampling and
works at least as well in situations where the population sampled from is
"randomly ordered" with respect to the value of a quantity being
measured or recorded, or is ordered in order of size of such a quantity. It
does not always require a sampling frame. For example, if 1000 beekeepers
attend a convention, to achieve a 10% sample of those attending, a participant
may be selected at random from the first 10 beekeepers to arrive or register,
and then every 10th person after that also selected.

*Stratified sampling* splits the population into subgroups or *strata, *using stratification factors such as geographical area or
degree of experience of the beekeeper, or beekeepers/bee farmers, which are
judged to be important in terms of coverage of the population and which are
likely to be related to the response variable(s) or interest. Then a random
sample, in the simplest case a *simple
random sample*, is selected separately from each stratum, using
predetermined sample sizes. This ensures representation of all these important
groups in the sample (which might not be achieved by a single simple random
sample), and the random sampling should compensate for any other relevant
stratification factors which may have been overlooked in the survey design. It
also allows comparison of the responses from each stratum, provided enough
responses are achieved in each stratum.

If the average responses do differ between the strata, and/or the
variation in recorded responses differs between strata, stratified sampling
should provide estimates with a lower variance than simple random sampling. The
lower variance is achieved because separate samples have been taken from
populations with smaller variation within them compared to the population as a
whole (Schaeffer *et al.*, 1990).

One basis for stratified sampling is
operation size. The scale of beekeeping operations and management practices are
very different for hobbyist beekeepers and professional/commercial operators
(bee farmers). Due to the potential for different numbers of lost colonies and
consequences of losses among these two groups, both should be included whenever
possible in a survey. This allows the colony loss rates experienced by both
groups to be compared and it is more representative of overall levels of loss. Box 4 gives an
example.

In Scotland, any beekeeper (or other person interested in bees) can choose to become a member of the Scottish Beekeepers’ Association, while there is a separate Bee Farmers’ Association for the UK, the qualification for the latter being that the beekeeper should keep at least 40 colonies of honey bees within the UK. There is known to be some overlap between the two membership lists, and care needs to be taken not to request survey participation of the same person twice for the same survey. Despite the fact that there are far fewer bee farmers than hobbyist beekeepers in Scotland, it is clear that they manage more than half the managed colonies, so that their contribution to the overall bee population is far greater than their numbers would suggest. Therefore in a recent survey it was decided to sample all of the bee farmers who could be identified, while selecting a random sample of non-commercial beekeepers (Gray and Peterson, 2012). |

The migration of colonies (the movement of colonies to/from nectar flows
or for purposes of crop pollination) differs widely between beekeeping
operations. Therefore, it is also desirable to consider different classes of
migratory practice where possible when designing and analysing the survey. As
migration may be a factor in loss rates (although see VanEngelsdorp *et al.*, 2010), comparing migratory and
non-migratory beekeepers is important, if the sample sizes permit valid
comparisons. In places where there is widespread practice of migration on a
large scale, this comparison becomes much more important. Identifying
beekeepers practising migration of bees in advance of drawing a sample may be
difficult, unless auxiliary sources like membership records include this information. If
this information is available, then a stratified approach may be adopted to
ensure coverage of both migratory and non-migratory beekeepers.

Geographical stratification may also be important, especially if different regions are subject to different weather conditions and differing exposure to bee diseases. However, combining multiple stratification factors with lower than ideal response rates can make the desired comparisons statistically invalid or impossible due to small samples.

*Cluster sampling* is the other main method of probability sampling. If the population can
be divided into convenient groups of population elements rather than strata
thought to differ in ways relevant to the response(s) of interest, then
randomly selecting a few of the groups and including everyone in those selected
groups as part of the sample will provide a representative sample from the
whole population if the groups or *clusters
*are representative of the population. For example, these clusters might
consist of local beekeeping associations, which would be viewed as groups of
beekeepers. This is a one-stage cluster sample design. There are other variants
of this method, but they are unlikely to be of practical importance in this
field of application.