9. Choice of sample size

In a probability-based sample, the sample size can be calculated statistically in order to achieve a required level of precision of estimates from the data collected, where these estimates have been identified in advance as being of interest. The formulae required depend on the sampling scheme to be used. Schaeffer et al. (1990) give details.

For example in a simple random sample, to estimate a mean, e.g. average number of colonies kept per beekeeper, to within a distance or error bound B of the correct value with approximately 95% confidence, the formula for the sample size is equation01 whereequation02 and equation03 is the variance in the population of the quantity of interest, e.g. the number of colonies kept, and equation04 is the population size. In the case of a very large population of beekeepers, where N is not known exactly, an approximation to this sample size is given by equation05 . The population variance may be estimated from the variance calculated from data in a previous survey of the same population, or from a pilot survey. To estimate a total (by the population size equation04times the sample average) with the same precision uses this same formula but with equation06.  Box 10 provides an example of the calculations.

Box 10. Sample size calculation for a survey to estimate a mean or a total.

For example, using a simple random sampling approach, to estimate the average number of colonies kept to within a margin of error of 10% (B=0.10) of the true value with an approximate confidence level of 95%, the sample size is calculated as follows. We use the formula equation01 where equation02 equation07. Assuming that the total number of beekeepers in the population is 1500, and if we have recent information from a previous survey that the variance equation03 of the number of colonies per beekeeper is about 4, then we should sample equation08 beekeepers, rounding up to the nearest integer. If we wished to estimate the total number of colonies kept, say to within 200 of the actual total with the same level of confidence, then making use of the same information, we calculate instead equation09 0.00444, which now gives  equation10 beekeepers to be sampled.
 To estimate a proportion p to within an error bound B of the true value with approximately 95% confidence, the same exact and approximate formulae are used as for estimating a mean, but with equation11, so in the large population case equation12. These formulae require an approximate value for equation13 based on prior experience, or else substitution of a conservative value of equation14

to maximise the required sample size. Box 11 shows the calculations.

Box 11. Sample size calculation for a survey to estimate a proportion.

For example, using a simple random sampling approach, to estimate an overall proportion of losses which was 20% last year (so p=0.20 approximately), to within a margin of error of 5% (B=0.05) of the true value with an approximate confidence level of 95%, the sample size is calculated as follows. The population size is assumed large, but is unknown.  So we use the large population version of the sample size formula for estimation of a proportion given by equation15. Here this gives equation16, giving equation17 exactly. So the sample should be composed of at least 256 individuals to achieve the required level of precision.


If there is more than one quantity to be estimated, as there will be in surveys of beekeepers, the larger of the relevant calculated sample sizes can be used, where this is feasible, or it can be decided to focus on one more important estimator, e.g. the proportion of beekeepers experiencing winter colony loss or the proportion experiencing CDS losses. It is then accepted that any other estimates requiring a larger sample size will be estimated with lower precision than is desirable.

For a stratified sample, which takes simple random samples from each stratum, similar calculations may be done to obtain the overall sample size required to estimate the mean or total or proportion to within an error bound B of the true value with approximately 95% confidence. See Schaeffer et al. (1990), for example, for details.

Various approaches are possible to divide the chosen sample size between the strata, including the proportional method which takes the sample size equation18 in the equation19th stratum proportional to equation20, where equation21 is the size of the equation19th stratum and equation04 is the population size. This means taking equation22, where equation23 is the equation19th stratum weight or the proportion of the population belonging to stratum equation19.

Neyman allocation is a more complex method which splits the sample between strata in order to minimise the variance of the unbiased estimator of the population mean (given by equation24, where equation25where equation26and equation27 is the mean of the sample from stratum equation19) or of the total (taken as equation04times the estimator for the mean) by taking the equation19th stratum sample size equation18 proportional to equation28 or equation29, where equation30 is the variance within stratum equation19and equation36 is is the standard deviation the variance within stratum equation19. So



The within stratum variances may be estimated from previous experience or a pilot survey.

To estimate a proportion (by equation34, where equation35 is the sample proportion in stratum equation31), the same formula can be used for allocation as for estimating a mean, but equation36 is replaced by equation37 where equation38 is the value of the population proportion in stratum equation31 (and in practice an estimate of this is used).

The Neyman approach can also be modified, if required, to incorporate different sampling costs for each stratum. More complex modified Neyman allocation schemes are also possible (Särndal et al., 1992).

More generally it may be decided, in order to achieve a suitable coverage of the population, that a fixed percentage of the population should be sampled. For some of the COLOSS surveys, a guideline for acceptable coverage has been that, where possible, at least 5% of beekeepers should be surveyed. This is a simple way to choose sample size, especially in a non-probability sample for which sample size calculations are not valid.

Another concern in a smaller population which may be surveyed repeatedly is not to overburden individuals, but to maintain goodwill. This may mean taking a smaller sample than is ideal. Data processing concerns may also limit the sample size.

If the level of non-response can be anticipated, for example, from recent experience, the calculated or chosen sample size can be increased accordingly, in order still to give a sample of the required size, as equation39, where equation40 is the original sample size, equation41 is the new size, and equation42 is the expected non-response rate as a proportion, e.g., equation43.

Obtaining standard errors of estimates, or confidence intervals, as part of the data analysis indicates how precisely the various quantities of interest have been estimated (see sections 4.1.2. and 10.).