|Year : 2020 | Volume
| Issue : 2 | Page : 317-322
Basics of statistics-3: Sample size calculation – (i)
Department of Medical Oncology and Hemato-Oncology, Narayana Superspeciality Hospital, Gurugram, Haryana, India
|Date of Submission||29-Mar-2020|
|Date of Decision||28-Apr-2020|
|Date of Acceptance||08-May-2020|
|Date of Web Publication||19-Jun-2020|
H S Darling
Narayana Superspeciality Hospital, Gurugram - 122 002, Haryana
Source of Support: None, Conflict of Interest: None
Introduction: From a practicing oncologist's perspective, sample size calculation is a very intriguing aspect of medical statistics.
Methods: Basic aspects of sample size calculation in relevant case scenarios are discussed.
Results: The formulae are illustrated with examples for easier understanding.
Discussion: This article is a brief account of sample size calculation methods in different clinical research scenarios. The derivation of formulae is beyond the scope of this article. The discussion is kept simple by illustrations matching real life studies. More complex methods will be discussed in the next session of this series.
Keywords: Comparative studies, one-tailed alternative, sample size, sample size calculation, two-tailed alternative
|How to cite this article:|
Darling H S. Basics of statistics-3: Sample size calculation – (i). Cancer Res Stat Treat 2020;3:317-22
| Introduction|| |
Every one of us has tasted a sample of the entity called “sample size calculation” during thesis protocol submission in our postgraduation days. Most of us, with the help of the statistician, managed to arrive at an achievable number of cases for our thesis. Sample size calculation is typically a tough and unpleasant topic, unless one has a special interest or has devoted special time and effort to go into its intricacies. Principal investigators of clinical studies have more knowledge about the procedure, limitations, and rationale for sample size calculation. Is it important for everyone to know this? Yes, everyone needs to know, at the very least, the rationale for sample size calculation to be able to decipher the clinical applicability of published studies in day-to-day practice. Even for an avid researcher, this knowledge will help in better collaboration with a biostatistician, who may not be well versed with the clinical aspects and study requirements. It is important to be conversant with certain terms before delving deep into the methodology. In this article, we attempt to cover the basic concepts of sample size calculation for simple clinical studies with illustrative examples.
| Definitions|| |
- Clinical significance – It is the measure of measure of the minimum clinically relevant difference in the outcomes between the study groups that a researcher is interested in. It is the starting point for sample size calculation. It is an assumption based on previous literature, pilot studies, or experience.,
- Concordant and discordant pair – In a matched pair study design, a concordant pair is a matched pair with similar outcomes for both the members, whereas a discordant pair is a matched pair with dissimilar outcomes for dissimilar outcomes for the two members.
- Confidence interval (CI) – A range of values depicting the precision of the estimate. For instance, a 95% CI means that one can be 95% confident that the true value lies within the defined range.
- Null (H0) and alternative (H1) hypotheses – A null hypothesis denotes the absence of a statistically significant association or difference in the outcomes of two comparison groups, whereas the alternative hypothesis denotes the presence of it.,
- One- and two-tailed significance – If the alternative mean or proportion is expected to be only greater or only less than the null mean, then a one-tailed test is appropriate; if it is expected to be on either side, a two-tailed test is appropriate.
- Power: It is the probability that a statistical test will truly detect a significant difference when a real difference of a given size exists, i.e., correctly reject a false null hypothesis. It is calculated as 1− β, where β is the Type II error.
- P value – P value expresses the probability of obtaining the observed study results, if the null hypothesis were true. The smaller the P value, less is the chance of rejecting a true null hypothesis.
- Sample size – The number of individuals selected from a population for a study.
- Standard deviation (SD) – A measure of dispersion around the mean value in continuous data. It is equal to the square root of the variance 
- Statistical significance – This is a derived value from statistical calculation that helps to decide how much a given difference in the outcomes is important. It is denoted by the P value, usually chosen as P < 0.05.
- Type 1 error – This is the chance of rejecting a true null hypothesis. This may occur if the sample size is much larger than required.,
- Type 2 error – This is the chance of accepting a false null hypothesis. It is the chance that the study may fail to detect a clinically relevant association/difference between the study groups. This may occur if the sample size is far lower than required.,
| The Importance of Sample Size|| |
The minimum clinically meaningful difference is the most crucial factor in sample size calculation. The sample size should be big enough to not miss this difference. If the sample size is too small to detect this difference when it exists, the comparison will be non-significant, and the study will therefore be inconclusive. The choice of a clinically important difference is not a statistical one but relates to the context of the study. It can be difficult to decide how big a difference would be important in a given context. Data already present in the literature, discussions with colleagues, or a pilot study may help to decide what size of difference is important. The sample size should be neither too big nor too small to avoid Type I and II errors, wastage of resources, and ethical implications. An unduly small study would be futile, and an unduly large one may deprive some patients of the superior treatment. While clinical significance remains the same, the statistical significance increases or decreases in accordance with the sample size. We must therefore choose an optimal sample size. Hence, the researchers should consider the study design first and then choose an appropriate sample size calculation method. For this reason, many funders and institutional review boards require an a priori sample size calculation to be included in the study protocol.
| How to Calculate the Sample Size|| |
Except for descriptive studies, all other studies that require sample size calculation are conducted to prove or disprove what the researcher presumes depending on the previous hypotheses generated by descriptive studies or other available literature. One- or two-tailed significance level and power are decided depending on how sure the researcher is about the presumption. The sample size calculation is a bargain between missing the real difference and finding a clinically unimportant, but statistically significant difference. For simple studies, standard formulae can be used; however, for complex studies, specialized statistical software programs are required. Sample size calculation is either an estimation of a parameter on the basis of a presumed CI or a comparative study in which we are trying to test the null or alternative hypothesis.
| Sample Size Characteristics|| |
Both the sample quantity and quality are important to justify the validity of the sample being a true representative of the population under study. The quality of the sample needs to be representative of the underlying population of interest for the results to be generalizable to that population. There are various types of sample collection methods, namely convenience sampling, quota sampling, random sampling, stratified sampling, and cluster sampling.
Sample size estimation needs to take into account the required precision of the analyses, study design, and the clinically meaningful difference. This can be understood with the help of a simple example. Consider a disease that has a reported prevalence of 5%. If the sample size is 20 individuals, only 1 person would be expected to have the disease. Hence, a decrease in the number of affected individuals by one, will make the prevalence zero, whereas an increase by one will double the prevalence. Conversely, if the prevalence is 80%, then 16 people are likely to be positive, and an increase or decrease by 1 will not make that big a difference. Precision is defined by the CI in estimating quantities from a proportion or mean, and by α and β errors in hypothesis testing studies.
[TAG:2]Methods of Sample Size Calculation ,[/TAG:2]
Sample size can be calculated by three methods: formulae, tables, and software. Here is a brief account of the various formulae used for simple statistical studies with normally distributed data. The derivation of the formulae is beyond the scope of this article. Let us work on different presumptive studies:
Specified precision methods
Precision methods are used when the significant clinical effect is already known, and the researcher further wants to estimate the effect with a defined degree of precision.
1. Estimating a mean with a specified precision. Example of a cross-sectional study
The sample size, n, is given by: n = k2 × 4 SD2/d2, where SD is estimated from previously published studies or from a small pilot study; d is the desired width of the CI, and is decided by the researcher considering clinical significance of the study results; k is a constant depending on the two-sided CI, depending on the level of confidence [Table 1].
Suppose, we wish to estimate the mean fasting blood sugar (FBS) in a non-diabetic patient group with a 10 mg/dL-wide 95% CI, i.e., 5 mg/dL on either side of the mean. Previous work suggested using an SD of 8.4 mg/dL.
n = 1.962 × 4 (8.4)2/102 = 10.84 = 11 patients.
Narrowing the CI by half quadruples the sample size requirement. For example, if we reduce the CI to 5, i.e. 2.5 mg on either side of the mean, keeping the same confidence level,
n = 1.962 × 4 (8.4) 2/52 = 43.37 = 44 patients.
Reducing the confidence level, i.e. widening the CI, reduces the sample size, for example, if confidence level is reduced to 90%,
n = 1.642 × 4 (8.4) 2/102 = 7.59 = 8 patients.
2. Estimating a mean with a specified precision. Example of an interventional study
Find the minimum sample size needed to estimate the change in FBS with a new drug if we require that the two-sided 95% CI be no wider than 10 mg/dL, and the sample SD for change in FBS with previously available literature is 15 mg/dL.
n = 1.962 × 4 (15)2/102 = 34.57 = 35 patients.
3. Estimating a proportion
Sometimes, instead of numbers, proportion (percentage) is more informative. Suppose, we wish to estimate the mortality rate of COVID-19 in the Indian population with the width of the two-sided 90% CI being 0.01 and an accuracy of ± 0.005. An estimate of the mortality rate of COVID-19 is 0.045 (4.5%).
n = k2 × 4 p(1 – p)/d2.
The expected population proportion (not the percentage) (p) = 0.045, the desired width of the CI (d) = 0.01, and the confidence level = 90%.
n = 10.76 × 0.045 (1 − 0.045)/0.012.
n = 4624.11 = 4625 patients.
Sample size estimation on the basis of hypothesis testing – two hypothesis, one sample inference
Hypothesis testing is a method of proving or refuting subjective observations based on probabilistic methods. This is the most common method of sample size calculations. In such formulae, we need a z table. A small list of commonly used z values is presented in [Table 2].
4. Estimating a mean (one-tailed alternative)
T o test H0: μ = μ0 versus H1: μ = μ1, where the data are normally distributed with a mean μ and known variance σ2. H0 is the null and H1 is the alternative hypothesis, μ0 and μ1 are the means of the reference and study population, respectively. The parameters μ1 and σ2 may be obtained from previous work, prior knowledge of the underlying distribution, or by pilot studies. Sample size formula is,
Suppose, we want to test the hypothesis that the mean heart rate (HR) of adult patients with metastatic cancer is higher than “normal.” To test this hypothesis, a list is obtained of HR recordings from 100 consecutive healthy individuals from the general population. The mean HR is found to be 80 bpm (μ1) with a sample SD of 18 bpm (σ). Assuming the clinically important increase in HR of patients with metastatic cancer to be 12 (μ0- μ1), let us compute the appropriate sample size needed to conduct the study. Suppose, α = 0.05, 1 – β = 0.90, and we use a one-sided test.
n = 182 (1.28 + 1.645) 2/(12) 2 = 19.25 = 20 patients.
5. Estimating a mean (two-tailed alternative)
Consider that in the above example we collect the HR values of the well-responding patients with metastatic cancer at 3 months of treatment. We presume the HR changes by a factor of 12 in either direction, with an SD of 18 bpm. What should be the sample size with 90% power and α =0.05?
n = 182 (1.28 + 1.96) 2/(92 − 80)2 = 23.62 = 24 patients.
It is noticeable that the sample size increases as σ2 increases, α decreases, and power increases. On the other hand, it decreases as μ0− μ1 increases.
6. Estimation for binomial proportions (two-tailed alternative)
The sample size is calculated as,
Suppose, we wish to test the hypothesis that the prevalence of liver cancer in patients with alcoholic liver disease (ALD) is 8% and in those with non-alcoholic liver disease is 3%. What number of ALD patients will need to be studied to prove this hypothesis with 80% power and a two-tailed α of 0.05?
n = 0.03 × 0.97 (1.96 + 0.84 √[(0.08 × 0.92)/(0.03 × 0.97)]) 2/0.052 = 126.45 = 127 patients.
We can also use the one-tailed alternative by replacing α/2 with α in the equation.
Sample size estimation on the basis of hypothesis testing – two hypotheses, two-sample inference
7. Comparative studies: Estimating means (pre- and post-intervention or parallel groups)
First method: Using a single SD value.
n = 2c SD2/Δ2,
where Δ is the minimum clinically relevant difference and c is a constant taken from [Table 3] (for two-tailed α).
For example, to determine the effect of exercise on weight reduction in a group of adults compared to a no exercise group with equal number of adults, with SD 10 kg, Δ = 5 kg, α = 0.05, and power = 80%, the sample size required would be,
n = 2 × 10.5× (10) 2/52 = 84 adults in each group.
8. Comparative studies – estimating means
Second method: Different SD for two groups.
At an oncology center, the chart review shows the mean age of ten consecutive patients with breast cancer was 58 years, with an SD of 8 years (σ1), and ten consecutive patients with ovarian cancer was 63 years, with an SD of 6 years (σ2). The sample size required to compare the means of the two normally distributed samples of equal size using a two-tailed test with significance level α =0.5 and power = 80% will be,
n = (82 + 62)(1.96 + 0.84)2/52 = 31.36 = 32 patients in each group.
9. Comparative studies – estimating means – unequally sized groups
where k = n2/n1, the ratio of the two sample sizes.
Suppose, we have twice as many cases of breast cancer as ovarian cancer. The sample size required to compare the means of two normally distributed samples of equal size using a two-tailed test with significance level α =0.05 and power = 80% will be,
n1= (82 + 62/0.5) (1.96 + 0.84) 2/52 = 42.64 = 43 patients.
n2 = kn 1 = 21.32 = 22 patients.
If n2 is calculated first, then n1 = n2/k.
To perform a one-tailed test, simply substitute α for α/2.
10. Comparative studies – estimating binomial proportions (independent samples)
n = c (p1q1 + p2q2)/Δ2
where p1 is the expected population proportion in Group 1; p2 is the expected population proportion in Group 2; q1 = 1 − p1, q2 = 1 − p2, complementary probabilities; Δ = difference in the proportions.
For example, in a cross-sectional study to compare the presence of an invasive carcinoma on the biopsy of persistent oral lesions in tobacco exposed versus unexposed patients, with p1 = 0.09, p2 = 0.01, the required sample size with α = 0.05 and power = 90% would be,
n = 10.5 (0.09 [1–0.09] +0.01 [1–0.01])/(0.09–0.01) 2 = 150.6 = 151 adults in each group.
11. Comparative studies – estimating binomial proportions (independent samples)
Second method (equal or unequal groups),
n2 = kn1
p ◻q= (p1 + kp2)/1 + k, q◻ =1 − pq.
Taking the same case as example 10,
n = ( √[0.05 × 0.95 × 2] ×1.96+ √[0.09 × 0.91 + 0.01 × 0.99] ×1.28) 2/0.082 = 153.7 = 154 adults in each group (almost the same sample size as seen in example 10).
12. Comparative studies – estimating binomial proportions (independent samples)
As the number of oral lesions is more common in smokers than non-smokers, let us calculate the sample size for unequal groups, with the number of smokers three times more than the number of non-smokers.
k = n 2/n 1 = 1/3, by taking the equation from example 10.
n 1= ( √[0.07 × 0.93 × 4] ×1.96+ √[0.09 × 0.91 + 0.01 × 0.99 × 3] ×1.28) 2/0.082 = 205.9 = 206 patients.
n 2 = kn 1 = 1/3 × 205.9 = 68.6 = 69 patients.
To perform a one-tailed rather than a two-tailed test, simply substitute α for α/2 in the sample size formula in the equation from example 10.
If p1 and p2 increase, or the Δ decreases, sample size will increase.
13. Comparative studies – estimating binomial proportions (paired sample case)
A matched-pair design is used, in which the patients are matched for age and clinical stage of the disease, with one patient in a matched pair assigned to treatment A and the other to treatment B. Here, we make use of McNemar's test for correlated proportions, using the null hypothesis that out of the discordant pairs of outcomes, the proportion of each outcome is 50%. H0, P = ½ versus H1, P ≠ 1/2, pA is the probability that a discordant pair is of type A, i.e., treatment A member of the pair has the event and the treatment B member does not. The sample size is calculated by,
Suppose, we want to compare surgical resection (treatment A) with stereotactic radiation therapy (treatment B) in a matched pair design trial for patients with lung cancer with solitary central nervous system metastases. The outcome measures are recurrence, progression, or death in a period of 6 months. Patients are matched as per age group, histology, performance status, and comorbidities, with one patient in the matched pair assigned to treatment A and the other to treatment B. Based on previous literature, it is estimated that 85% of the matched pairs will have a concordant response (i.e. both will either die, progress, or recur) within 6 months or both will be alive and not have a progression or recurrence in 6 months, and 15% will have a discordant response. Among the discordant pairs, in one out of three pairs, the treatment arm A patient will die, progress, or recur, whereas the treatment arm B patient will not. Let us calculate the sample size with 80% power and a two-tailed α of 0.05:
n = (1.96 + 2 × 0.84 √1/3 × 2/3) 2/4 (1/3 − 0.5) 2 × 0.15 = 453.47 = 454 matched pairs.
| Limitations|| |
These are the sample size calculations for simple, normally distributed, complete data with no attrition or crossover considerations. However, in practice, there are many issues including attrition or crossover, which need to be factored in. Apart from this, clinical trials include skewed data distribution, clustering, multiple dependent and independent variables, continuous outcome variables, noninferiority, and equivalence comparisons. Handling of such issues and more complex equations will be dealt with in the next sections.
| Let Us Test What We Have Learned|| |
Q1: Choose the false statement about Type II error
- It is the complement of power
- It is denoted as β
- It is the acceptance of right null hypothesis
- It may occur when the sample size is less than required.
Q2: Choose the wrong option. Clinical significance is
- Same as statistical significance
- Most crucial for sample size calculation
- An arbitrary assumption
- Only important at study completion.
Q3: Prerequisites for sample size calculation do not include
- Clinically relevant difference
- Statistical significance
- Study design
Q4: A two-tailed α denotes
- The test arm may be better or worse than the control arm
- Study is not statistically significant
- Null hypothesis is correct
- Sample size will be half as compared to one-tailed α.
Q5. Which of the following is incorrect about specified precision methods?
- Null hypothesis is true
- Width of CI is factored in
- The effect of study drug is already known
- These are one-sample studies.
Q6. In estimating the means by one-sample hypothesis testing, sample size is directly proportional to
- difference between the means
Q7: A discordant pair in a paired sample design
- Performs better than a concordant pair
- Performs worse than a concordant pair
- Crosses over to the opposite arm
- Has unequal outcomes in both the members.
Q8: If we wish to test the hypothesis that rural women in India are at a higher risk of cervical cancer than urban women, which test should be used to calculate the sample size of a study on the prevalence of cervical cancer in rural Indian women, if we presume that the prevalence rate of cervical cancer in urban women is 15/100,000 and rural women is 50/100,000, taking a power of 80% and α = 0.05?
- Estimating binomial proportions – paired sample case
- Estimation for binomial proportions (one-sample, one-tailed alternative)
- Estimation for binomial proportions (two-sample, two-tailed alternative)
- Estimation of proportion by specified precision method.
Financial support and sponsorship
Conflicts of interest
There are no conflict of interest.
Answers: 1 (c), 2 (d), 3 (b), 4 (a), 5 (a), 6 (c), 7 (d), 8 (b).
| References|| |
Peacock JL, Peacock PJ. Oxford Handbook of Medical Statistics. New York: Oxford University Press Inc.; 2011.
Rosner B. Fundamentals of Biostatistics. 8th
ed. 20 Channel Center Street, Boston, MA 02210, USA: Cengage Learning; 2015. p. 211, 240, 241 244,245, 257, 307, 308, 403
Darling HS. Basics of statistics-1. Cancer Res Stat Treat 2019;2:163-8. [Full text]
Hickey GL, Grant SW, Dunning J, Siepe M. Statistical primer: Sample size and power calculations-why, when and how? Eur J Cardiothorac Surg 2018;54:4-9.
Darling H S. Basics of statistics – 2: Types of clinical studies. Cancer Res Stat Treat 2020;3:100-9. [Full text]
[Table 1], [Table 2], [Table 3]