Personal tools

# 6.1.2. Random sampling

Survey designs based on random sampling are designed to select sampling units from the population with known probabilities. This means that the sampling properties of estimators of population quantities can be determined, such as whether or not the estimator is unbiased (i.e., does it on average give the right answer?) and what is its precision (i.e., how do we calculate its variance or its standard error).

This is the objective scientific approach to sampling and the only one for which sampling properties of estimators are known. Other methods may provide good information but there is no guarantee that they will, and their sampling properties are unknown. However even with random sampling, if response rates are poor then the possibility of non-response bias will compromise the estimation.

Implementation of random sampling methods requires a mechanism for random selection, usually accomplished by use of random number generators in computer software, e.g. the “sample” function in the public domain software R (downloadable from http://www.r-project.org/). It also usually requires a sampling frame, or list of sampling units in the population (section 6.2).

The simplest scheme is simple random sampling, which samples randomly without replacement from the sampling frame so that at every stage every sampling unit not already selected from the sampling frame is equally likely to be chosen. This results in all samples of a given size being equally likely to be selected.

Systematic sampling is sometimes used as a simple alternative to simple random sampling and works at least as well in situations where the population sampled from is "randomly ordered" with respect to the value of a quantity being measured or recorded, or is ordered in order of size of such a quantity. It does not always require a sampling frame. For example, if 1000 beekeepers attend a convention, to achieve a 10% sample of those attending, a participant may be selected at random from the first 10 beekeepers to arrive or register, and then every 10th person after that also selected.

Stratified sampling splits the population into subgroups or strata, using stratification factors such as geographical area or degree of experience of the beekeeper, or beekeepers/bee farmers, which are judged to be important in terms of coverage of the population and which are likely to be related to the response variable(s) or interest. Then a random sample, in the simplest case a simple random sample, is selected separately from each stratum, using predetermined sample sizes. This ensures representation of all these important groups in the sample (which might not be achieved by a single simple random sample), and the random sampling should compensate for any other relevant stratification factors which may have been overlooked in the survey design. It also allows comparison of the responses from each stratum, provided enough responses are achieved in each stratum.

If the average responses do differ between the strata, and/or the variation in recorded responses differs between strata, stratified sampling should provide estimates with a lower variance than simple random sampling. The lower variance is achieved because separate samples have been taken from populations with smaller variation within them compared to the population as a whole (Schaeffer et al., 1990).

One basis for stratified sampling is operation size. The scale of beekeeping operations and management practices are very different for hobbyist beekeepers and professional/commercial operators (bee farmers). Due to the potential for different numbers of lost colonies and consequences of losses among these two groups, both should be included whenever possible in a survey. This allows the colony loss rates experienced by both groups to be compared and it is more representative of overall levels of loss. Box 4 gives an example.

 Box 4. Example: Case study of stratified random sample selection. In Scotland, any beekeeper (or other person interested in bees) can choose to become a member of the  Scottish Beekeepers’ Association, while there is a separate Bee Farmers’ Association for the UK, the qualification for the latter being that the beekeeper should keep at least 40 colonies of honey bees within the UK. There is known to be some overlap between the two membership lists, and care needs to be taken not to request survey participation of the same person twice for the same survey. Despite the fact that there are far fewer bee farmers than hobbyist beekeepers in Scotland, it is clear that they manage more than half the managed colonies, so that their contribution to the overall bee population is far greater than their numbers would suggest. Therefore in a recent survey it was decided to sample all of the bee farmers who could be identified, while selecting a random sample of non-commercial beekeepers (Gray and Peterson, 2012).

The migration of colonies (the movement of colonies to/from nectar flows or for purposes of crop pollination) differs widely between beekeeping operations. Therefore, it is also desirable to consider different classes of migratory practice where possible when designing and analysing the survey. As migration may be a factor in loss rates (although see VanEngelsdorp et al., 2010), comparing migratory and non-migratory beekeepers is important, if the sample sizes permit valid comparisons. In places where there is widespread practice of migration on a large scale, this comparison becomes much more important. Identifying beekeepers practising migration of bees in advance of drawing a sample may be difficult, unless auxiliary sources like membership records include this information. If this information is available, then a stratified approach may be adopted to ensure coverage of both migratory and non-migratory beekeepers.

Geographical stratification may also be important, especially if different regions are subject to different weather conditions and differing exposure to bee diseases. However, combining multiple stratification factors with lower than ideal response rates can make the desired comparisons statistically invalid or impossible due to small samples.

Cluster sampling is the other main method of probability sampling. If the population can be divided into convenient groups of population elements rather than strata thought to differ in ways relevant to the response(s) of interest, then randomly selecting a few of the groups and including everyone in those selected groups as part of the sample will provide a representative sample from the whole population if the groups or clusters are representative of the population. For example, these clusters might consist of local beekeeping associations, which would be viewed as groups of beekeepers. This is a one-stage cluster sample design. There are other variants of this method, but they are unlikely to be of practical importance in this field of application.