Home Print this page Email this page
Users Online: 67
Home About us Editorial board Search Ahead of print Current issue Archives Submit article Instructions Subscribe Contacts Login 

 Table of Contents  
Year : 2014  |  Volume : 2  |  Issue : 4  |  Page : 142-147

Sample size estimation and sampling techniques for selecting a representative sample

Department of Medical Education, Research Unit, King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia

Date of Web Publication13-Oct-2014

Correspondence Address:
Aamir Omair
Department of Medical Education, Research Unit, King Saud bin Abdulaziz University for Health Sciences, Riyadh
Saudi Arabia
Login to access the Email id

DOI: 10.4103/1658-600X.142783

Rights and Permissions

Introduction: The purpose of this article is to provide a general understanding of the concepts of sampling as applied to health-related research. Sample Size Estimation: It is important to select a representative sample in quantitative research in order to be able to generalize the results to the target population. The sample should be of the required sample size and must be selected using an appropriate probability sampling technique. There are many hidden biases which can adversely affect the outcome of the study. Important factors to consider for estimating the sample size include the size of the study population, confidence level, expected proportion of the outcome variable (for categorical variables)/standard deviation of the outcome variable (for numerical variables), and the required precision (margin of accuracy) from the study. The more the precision required, the greater is the required sample size. Sampling Techniques: The probability sampling techniques applied for health related research include simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multistage sampling. These are more recommended than the nonprobability sampling techniques, because the results of the study can be generalized to the target population.

Keywords: Sample, sample size, sampling techniques

How to cite this article:
Omair A. Sample size estimation and sampling techniques for selecting a representative sample. J Health Spec 2014;2:142-7

How to cite this URL:
Omair A. Sample size estimation and sampling techniques for selecting a representative sample. J Health Spec [serial online] 2014 [cited 2019 Jun 19];2:142-7. Available from: http://www.thejhs.org/text.asp?2014/2/4/142/142783

  Introduction Top

Selecting a sample that is representative of the general population is an important part of quantitative research. One of the major reasons that articles are rejected by good quality peer-reviewed journals is due to a nonrepresentative sample or not having an adequate sample size. [1] The results of a poorly selected sample that is different from the target population cannot be applied to the general population. [2] A smaller than required sample size may not have the appropriate power to identify significant differences or associations that may be present in the target population. The purpose of this article is to provide an overview of the importance of sampling in health research and to provide the readers with useful tips and resources for selecting a representative sample.

The first step in understanding the sampling process is to be familiar with the terminology [Table 1]. A sample is a subset of the total population that is of interest for the study topic. This "total" population is called the target population, to which the results of the study can be generalized. [3] For example, the outcomes of a study based on patients admitted in a tertiary care hospital in a major city cannot be generalized to patients presenting with the same condition to other types of health care facilities in smaller towns. The sample itself is selected from a section of the target population that is accessible to the researcher, which is called the study population. [4] The study population may be as simple as a list of patients admitted with a certain disease, or it may be as obscure as all patients visiting any health care facility with different signs and symptoms.
Table 1: Sampling framework - from target population to the sample

Click here to view

Even if all the available study population is being selected, it is important to keep in mind that the study population may still be inherently different from the target population. [5] Patients who present with a specific disease to a tertiary health care facility are not representative of all the patients with that disease in that area. The difference may be in the severity of the disease or even with the demographics of the patients depending on the type of health care facility, that is, Ministry of Health, Military hospital, Private hospital, etc. Patients who present to a specific health care center may be different from those who go to another health center or another health care provider. Hence, it is not advisable to generalize the results from a single hospital-based study to the whole city let alone the entire country. [6],[7] Another important point to consider is that people who agree to participate in the study may be different from the nonresponders. In general responders tend to be more health conscious and more literate, or they may be more likely to have a chronic condition as compared to an acute exacerbation of the disease. Hence, the results of the study may be different from the outcome in the general population either in a positive or negative direction, depending on how the responders are different from the nonresponders. [8]

It is important to show that the selected sample is representative of the study and the target population with regards to the demographic and other relevant characteristics that may affect the outcome of the study. [9] For example, in a study to compare the outcomes of diabetic patients being managed by an endocrinologist as compared to those being managed by family physicians. It is important to consider that the two groups have the same socioeconomic characteristics with regards to age, gender, income, and education since all of these are related to the outcome. [10] It is also important to consider the severity and duration of disease since patients presenting to the endocrinologist may be more likely to be already having complications due to diabetes mellitus. It is recommended to obtain the relevant demographic and background information from the responders to demonstrate that they are representative of the target/study population. [10] Additional information that may be easily obtained should also be collected about the nonresponders/lost to follow-up cases, like area of residence, body mass index (BMI), smoking status, etc. This will be useful to demonstrate that the responders are similar to the nonresponders with regards to these background variables. [11]

  Sample size estimation Top

A sample must be of the required size in order to have the required degree of accuracy in the results as well as to be able to identify any significant difference/association that may be present in the study population. [12] Determining the minimum required sample size for achieving the main objectives of the study is of prime importance for all studies but is generally neglected by most novice researchers. A common practice is to select all the cases that are available (consecutive sampling) in a given period of time or to select a sample size based on a previous study. [13] Another practice is to select a sample of 50 or 100 patients depending upon the time and resources available. [14] While the above assumptions may be adequate in some cases, they are generally not appropriate, especially for studies which require the comparison of two or more groups with respect to one or more outcomes of interest.

The factors that need to be considered when determining the required sample size include the size of the study population (from which the sample is to be selected), the confidence level (generally set at 95% confidence level), the expected prevalence or variance of the main outcome variable that is being studied, and the required margin of error/accuracy that is acceptable for the study. [12],[15] In studies comparing two or more groups, the power of the study is generally set at 80% and additional information regarding the expected difference between the two groups, will also be required. [16] Nowadays, it is not required to go about looking up difficult formulae and going through complicated calculations in order to determine the required sample size. There are a number of free online software and easily accessible websites like Open-Epi, [17] RaoSoft, [18] Pi-face, [19] etc., which can estimate a number of permutations for the required sample size based on the estimated parameters for the study population.

The researcher does need to do some preparation in advance before estimating the required sample size. The simplest scenario is a single sample study, where the prevalence of a specific variable is required in the study population, e.g., prevalence of diabetes mellitus or its complications. The additional information to determine the required sample size includes the estimated size of the study population (if very large then use 20,000), the expected prevalence of the main variable (if unknown then use 50%), and the required margin of accuracy (generally set at 10% or 5%). [20] The margin of accuracy is related to how accurate the required result is with regards to being close to the expected population value, the more precise the required results, the greater is the sample size required. Generally for an expected prevalence of around 50% for the outcome variable a margin of accuracy of ±10% requires a sample size of around 100, which increases to around 400 for an accuracy of ±5% and 10,000 for ±1% margin of accuracy.

In case of determining the sample size for determining the mean value for a numerical variable (e.g., BMI, cholesterol level, etc.,), the additional information required is for the expected variance of the required variable in the target population. [21] This information can be obtained from the literature review of similar studies in the form of the standard deviation (SD) for the required variable. The higher the SD, the greater will be the required sample size. In case the SD is not known for the target population, it can be estimated by taking the difference between the estimated "highest" and "lowest" values in the population and dividing it by four (±2 SD on either side of the mean for the "normal" distribution). [22] For example, the BMI for a group of diabetics is expected to have a high value of 48 and a low value of 16 kg/m 2 . Hence, the "normal" range is 48-16 = 32, which gives an estimated value for the SD of ±8 (32 divided by 4). The other information required for determining the sample size is the accuracy of the estimated mean, that is, how close it should be to the actual population mean. [23] In the above example, for BMI, the accuracy can be set as ±1, ±2 or ±4 kg/m 2 , the general rule is that the more precise the required accuracy, the greater is the required sample size. [24] A summary of the information required for estimating the sample size is given in [Table 2].
Table 2: Information required for estimating the required sample size

Click here to view

The requirements for determining the sample size for comparing between two (or more) groups becomes more complex with the requirements for estimation about the expected prevalence in both the groups (for categorical variables) and the expected difference of means (for numerical values). But the basic rule is the same - the greater the variability of the variable under study or the the more precise the required accuracy, the greater is the required sample size. [12] [Table 3] shows the estimated sample sizes for a categorical variable (hypertension) and a numerical variable (systolic blood pressure) for comparing these variables between smokers and nonsmokers for different level of accuracies. It is up to the researcher to select the required criteria according to the study objectives and the available resources. It should be kept in mind that these are all based on estimates and if the sample results are found to have more variability than used in the estimation then the P values will not be statistically significant. If there is provision for doing a pilot study, then the estimated prevalence or SD can be more accurately determined based on a smaller sample from within the study population for determining the required sample size more accurately. [25]
Table 3: Estimated sample sizes required for varying expected differences between two samples (nonsmokers versus smokers) for a categorical and numerical variable

Click here to view

  Sampling techniques Top

The other important issue related to sampling is selecting the required sample size in a manner, so that the sample is representative of the study population. [7] It is a common pitfall to opt for the easier option of convenience sampling where "all" the available persons in the study population are selected for the study until the required sample size is reached. This is nonprobability sampling, where the sample is less likely to be representative of the study population due to inherent biases in the sampling process. [13] Other forms of nonprobability sampling include purposive sampling, quota sampling, and snowball sampling, where the sample is selected according to some predetermined criteria. These type of sampling techniques are more appropriate for small level studies which are not meant to be generalized to a larger population. [13]

The more relevant sampling technique is called "probability sampling" or "random sampling." [26] It is important to note here that the word "random" as used in this context is different from the normal usage in the everyday terms. It is misleading to state that the sample was chosen at random from all the patients coming to the outpatient clinic. In order to be classified as random or probability sampling, every person in the study population must have an equal or known probability of being included in the sample. [7] It is quite common to overlook some hidden biases in the sampling process which adversely affect the outcome of the study. For example, if a study was to be conducted to determine the satisfaction of patients coming to a health care center and the decision was to sample every third patient who was coming out of the center. Apparently, this seems to be "unbiased" if every third person was selected accordingly. But one hidden factor is related to the outcome of the study, that is, satisfaction with the care provided. A person who is not satisfied with the health care provided would be unlikely to return to the center or would come only once a month, while a person who is satisfied would be returning more frequently maybe 2-3 times a month. Hence, it is quite likely that the result of the satisfaction survey shows a more positive result than the actual perception. [27] One way to account for this hidden bias is to interview only "new" patients who are visiting the clinic for the 1 st time or it may be sufficient to just ask the respondent how many times s/he has visited the clinic in the last month or year. [28] The same bias may be associated with random digit dialing for a phone survey. Apparently, the computer dials a number randomly so there should be no bias in the sample selection? Actually, there is still a hidden bias that people who have two phones (or double SIM phones) are twice as more likely to be selected as compared to the majority of people who have only one number. [29] The people with >1 phone are more likely to have a higher income so this may bias any study which may be asking about their perceptions about health care insurance or even about choosing between prepaid/postpaid mobile phone services. This type of bias can be controlled for by simply recognizing this as a bias at the planning stage of the survey and including a question on "How many phone numbers do you have?" in the survey. This can be used to appropriately weight the responses of such respondents in the final analysis stage. [29]

The types of probability sampling methods include simple random sampling, systematic random sampling, and stratified random sampling [Table 4] - these three methods are more relevant when a sample frame (list of the people in the study population) is available. [7] Simple random sampling is as simple as picking up chits (names or numbers written on pieces of paper) from the box for a small study population of up to 30-50 people. For larger study population, a computer-generated random number table can be used to select the respondents accordingly, e.g., every n th person coming out of a clinic or selecting the n th person from each household. [7] Systematic random sampling is applicable when the study population is relatively large (100 or more) and a list is available of all the members, e.g., employees in a hospital, medical students in a class, or even beds in a hospital. The total number of subjects in the list is divided by the required sample size to obtain the "skip number" e.g., to select 25 out of a list of 200 the skip number will be every 8 th person on the list (200/25 = 8). The next step is to choose a number randomly from between 1 and 8 which will be the first person selected and then systematically select every 8 th person from the list till the end of the list is reached, e.g., 3, 11, 19, 27, …, 195. It is important to remember that the first person should be chosen randomly - arbitrarily selecting the 1 st person or the 8 th person on the list will lead to zero probability of the other persons in the list being selected. [30] Stratified random sampling is a form of systematic random sampling with the addition that the list is stratified (arranged by categories) according to a predetermined characteristic, e.g., gender, level of employees, class in medical college. After arranging the list according to the specified criterion, the same process of selecting every n th person is followed as in systematic random sampling. [30] The stratified random sampling technique ensures that the sample contains approximately the same proportion of the specified criterion as in the study population. This is important when the outcome variable that is being studied is directly related to that particular characteristic, e.g., gender and smoking, employee satisfaction, and level of employees. The other two probably sampling methods of cluster sampling and multistage sampling are more appropriate for a community based or large scale surveys and will not be described in detail in this article. More information on these two methods can be obtained from other detailed text on sampling. [7],[9],[10],[15],[30] The issue of avoiding bias due to nonresponse in sampling will be discussed in detail in the next article on data collection methods.
Table 4: List of different probability and nonprobability sampling methods

Click here to view

  Conclusion Top

The issue of sampling is of an important consideration in all quantitative research which aims to generalize the finding of the study to a larger population. It is essential to have the required sample size as well as to select a representative sample using the appropriate sampling technique.

  References Top

1.Bordage G. Reasons reviewers reject and accept manuscripts: The strengths and weaknesses in medical education reports. Acad Med 2001;76:889-96.  Back to cited text no. 1
2.University of Texas. Common Mistakes in Using Statistics: Spotting and Avoiding Them. Available from: http://www.ma.utexas.edu/users/mks/statmistakes/biasedsampling.html. [Last accessed on 2014 Sep 24; Last accessed on 2012 Aug 28].  Back to cited text no. 2
3.Easton VJ, McColl JH. Statistics Glossary. Available from: http://www.stats.gla.ac.uk/steps/glossary/sampling.html. [Last accessed on 2014 Sep 24].  Back to cited text no. 3
4.Trochim WM. Research Methods Knowledge Base: Sampling Terminology, [Oct 28 th , 2008]. Available from: http://www.socialresearchmethods.net/kb/sampterm.php. [Last accessed on 2014 Sep 24].  Back to cited text no. 4
5.Freedman DA. Sampling. Available from: http://www.stat.berkeley.edu/~census/sample.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 5
6.Population and Samples: The Principle of Generalization. Available from: http://www.cios.org/readbook/rmcs/ch05.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 6
7.Schutt RK, Engel RJ. Sampling. In: The Practice of Research in Social Work. 3 rd ed., Ch. 5. Washington DC: SAGE Publications Inc.; 2008. Available from: http://www.sagepub.com/upm-data/24480_Ch5.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 7
8.Singer E. Introduction: Nonresponse bias in household surveys. Pub Opinion Quart 2006;70:637-45. Available from: http://www.poq.oxfordjournals.org/content/70/5/637.full.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 8
9.Watt JH, van den Berg S. Sampling. In: Research Methods for Communication Science. Ch. 6. 2002. Available from: http://www.cios.org/readbook/rmcs/ch06.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 9
10.Ross KE. Sample design for educational survey research. In: Quantitative Research Methods in Educational Planning. Paris: UNESCO International Institute for Educational Planning; 2005. p. 4. Available from: http://www.unesco.org/iiep/PDF/TR_Mods/Qu_Mod3.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 10
11.Barclay S, Todd C, Finlay I, Grande G, Wyatt P. Not another questionnaire! Maximizing the response rate, predicting non-response and assessing non-response bias in postal questionnaire studies of GPs. Fam Pract 2002;19:105-11. Available from: http://www.fampra.oxfordjournals.org/content/19/1/105.full.pdf. [Last accessed on 2014 Sep 24].  Back to cited text no. 11
12.Israel GD. Determining Sample Size, [April 2009]. Available from: http://www.edis.ifas.ufl.edu/pdffiles/PD/PD00600.pdf. [Last accessed on 2014 Sep 25].  Back to cited text no. 12
13.Explorable Psychology Experiments. Non-Probability Sampling, [17 May, 2009]. Available from: http://www.explorable.com/non-probability-sampling. [Last accessed on 2014 Sep 25].  Back to cited text no. 13
14.Science Buddies. Sample Size: How Many Survey Participants Do I Need? Available from: http://www.sciencebuddies.org/science-fair-projects/project_ideas/Soc_participants.shtml. [Last accessed on 2014 Sep 25].  Back to cited text no. 14
15.Kadam P, Bhalerao S. Sample size calculation. Int J Ayurveda Res 2010;1:55-7.  Back to cited text no. 15
[PUBMED]  Medknow Journal  
16.Andrews University. Applied statistics - Lesson 11: Power and sample size, [28 Jul, 2005]. Available from: http://www.andrews.edu/~calkins/math/edrm611/edrm11.htm. [Last accessed on 2014 Sep 25].  Back to cited text no. 16
17.Dean AG, Sullivan KM, Soe MM. Open Epi: Open Source Epidemiological Statistics for Public Health, Version. Available from: http://www.openepi.com/Menu/OE_Menu.htm. [Last accessed on 2014 Sep 25; Last updated on 2014 Sep 22].  Back to cited text no. 17
18.Raosoft Inc. Sample Size Calculator. 2004. Available from: http://www.raosoft.com/samplesize.html. [Last accessed on 2014 Sep 25].  Back to cited text no. 18
19.Lenth RV. Java apllets for power and sample size, [Computer software, 2006]. Available from: http://www.homepage.stat.uiowa.edu/~rlenth/Power/oldversion.html. [Last accessed on 2014 Sep 25].  Back to cited text no. 19
20.Triola MF. Estimates and sample sizes. In: Elementary Statistics. 12 th ed., Ch. 7. New York: Prentice Hall Inc.; 2012. Available from: http://www.math.wayne.edu/~menaldi/teach/others/Sta1020/ElemStat_Triola_Chapter7.pdf. [Last accessed on 2014 Sep 25].  Back to cited text no. 20
21.National Research Council (US) Committee on Guidelines for the Use of Animals in Neuroscience and Behavioral Research. Sample Size Determination. In: Guidelines for the Use of Animals in Neuroscience and Behavioral Research. Washington DC: National Academies Press; 2003. Appendix A. Available from: http://www.ncbi.nlm.nih.gov/books/NBK43321/#a20007f55ddd00182. [Last accessed on 2014 Sep 25].  Back to cited text no. 21
22.Henry GT. Sample size. In: Practical Sampling. Thousand Oaks CA: SAGE Publishing Inc.; 1990. p. 117-29.  Back to cited text no. 22
23.Boston University, School of Public Health. Power and Sample Size Determination. Available from: http://www.sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Power/BS704_Power_print.html. [Last accessed on 2014 Sep 25].  Back to cited text no. 23
24.Penn State University. Stat 100- Statistical Concepts and Reasoning. 2014. Available from: http://www.onlinecourses.science.psu.edu/stat100/node/17. [Last accessed on 2014 Sep 25].  Back to cited text no. 24
25.Noordzij M, Tripepi G, Dekker FW, Zoccali C, Tanck MW, Jager KJ. Sample size calculations: Basic principles and common pitfalls. Nephrol Dial Transplant 2002;17:2087-93. Available from: http://www.ndt.oxfordjournals.org/content/25/5/1388.long. [Last accessed on 2014 Sep 25].  Back to cited text no. 25
26.Doherty M. Probability versus non-probability sampling in sample surveys. New Zealand Stat Rev 1994;21-8. Available from: http://www.nss.gov.au/nss/home.nsf/75427d7291fa0145ca2571340022a2ad/768dd0fbbf616c71ca2571ab002470cd/$FILE/Probability%20 versus%20Non%20Probability%20Sampling.pdf. [Last accessed on 2014 Sep 26].  Back to cited text no. 26
27.Lane DM. Research design. In: Online Statistics Education: An Interactive Multimedia Course of Study. Rice University, University of Houston, Tufts University. Available from: http://www.onlinestatbook.com/2/research_design/sampling.html. [Last accessed on 2014 Sep 26].  Back to cited text no. 27
28.World Health Organization. Toolkit on Monitoring Health Systems Strengthening: Service Delivery, [June 2008]. Available from: http://www.who.int/healthinfo/statistics/toolkit_hss/EN_PDF_Toolkit_HSS_ServiceDelivery.pdf. [Last accessed on 2014 Sep 26].  Back to cited text no. 28
29.Ferraro D, Krenzke T, Montaquila J. RDD Telephone Surveys: Reducing Bias and Increasing Operational Efficiency. Joint Statistical Meeting: Section on Survey Research Methods; 2008. p. 1949-56. Available from: http://www.amstat.org/sections/srms/proceedings/y2008/Files/301280.pdf. [Last accessed on 2014 Sep 26].  Back to cited text no. 29
30.Daniel J. Choosing the type of probability sampling. In: Sampling Essentials. CH. 5. SAGE Publications Inc.; 2012. Available from: http://www.sagepub.com/upm-data/40803_5.pdf. [Last accessed on 2014 Sep 26].  Back to cited text no. 30


  [Table 1], [Table 2], [Table 3], [Table 4]

This article has been cited by
1 The continuous sample of working lives: improving its representativeness
Juan Manuel Pérez-Salamero González,Marta Regúlez-Castillo,Carlos Vidal-Meliá
SERIEs. 2017;
[Pubmed] | [DOI]
2 Understanding the mechanisms through which womenæs group community participatory intervention improved maternal health outcomes in rural Malawi: was the use of contraceptives the pathway?
Collins O. F. Zamawe,Chrispin Mandiwa
Global Health Action. 2016; 9(1): 30496
[Pubmed] | [DOI]
3 Selection of Large Sub-Samples from the Continuous Sample of Working Lives Representative of the Benefits Provided by the Spanish Public Pension System
Juan Manuel PPrez Salamero Gonzzlez,Marta Regglez-Castillo,Carlos Vidal-Melii
SSRN Electronic Journal. 2016;
[Pubmed] | [DOI]
4 Un Procedimiento De Selecciin De Sub-Muestras De Gran Tamaao De Una Muestra Aleatoria Simple Representativas De La Poblaciin De Estudio (A Selection Procedure of High Size Sub-Samples from a Simple Random Sample Representative of the Population)
Juan Manuel PPrez Salamero Gonzzlez,Marta Regglez-Castillo,Carlos Vidal-Melii
SSRN Electronic Journal. 2015;
[Pubmed] | [DOI]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Sample size esti...
Sampling techniques
Article Tables

 Article Access Statistics
    PDF Downloaded3320    
    Comments [Add]    
    Cited by others 4    

Recommend this journal