|Year : 2022 | Volume
| Issue : 1 | Page : 139-144
Are you confident about your confidence in confidence intervals?
Department of Medical Oncology and Hemato-oncology, Command Hospital Air Force, Bengaluru, Karnataka, India
|Date of Submission||02-May-2021|
|Date of Decision||03-Apr-2022|
|Date of Acceptance||03-Oct-2022|
|Date of Web Publication||31-Mar-2022|
H S Darling
Department of Medical Oncology and Hemato-oncology, Command Hospital Air Force, Bengaluru, Karnataka - 560 007
Source of Support: None, Conflict of Interest: None
Confidence interval (CI) is a commonly used parameter in the statistical analysis of various clinical studies. Despite being highly informative, and easily interpretable, the CI is associated with certain oversimplifications, leading to occasionally distorted conclusions of the results. This review is an overview of the characteristics, uses, shortcomings, and recommendations to help guide the best use of the CI. We searched in the PubMed and Cochrane databases using the search terms, “clinical importance,” “confidence interval,” “confidence level,” “point estimate,” and “statistical significance,” and the articles directly relevant to this review were chosen. The purpose of the review is to familiarize clinicians with the concept of the CI and to equip them with a basic understanding of the results of scientific publications.
Keywords: Clinical Importance, confidence interval, confidence level, point estimate, statistical significance
|How to cite this article:|
Darling H S. Are you confident about your confidence in confidence intervals?. Cancer Res Stat Treat 2022;5:139-44
| Introduction|| |
Medical statistics is an indispensable part of evidence-based clinical practice. Clinicians often face a lacuna while interpreting the published results of clinical studies, which are indispensable to optimal patient care. Accurate implementation of research outcomes is also an important field of ongoing research. The fifth Sikh Guru, Sri Guru Arjan Dev Ji, said, “Khojaŧ khojaŧ sunee īh soī” (By searching and seeking, I have heard this news). In the previous issue of Cancer Research Statistics and Treatment, we discussed the controversies surrounding the P value. In this article, we critically examine the confidence interval (CI).
Many times, we merely read the abstracts of the newly published research articles and attempt to apply the results in clinic. The abstracts of two studies with similar results may give radically diverging conclusions. Such controversies may arise due to an incomplete understanding of the concept of statistical significance as being strictly dichotomous, and misinterpretation of subtle differences of CIs that either just fail to touch or marginally breach the line of statistical significance. Clinicians have a far better understanding of the concept of conventional statistical significance than of the relevance of the location and size of the effect measure, including the CI. In this review, we aim to equip the practicing oncologist with the set of questions that need to be asked and answered while evaluating an article that makes a conclusion regarding the effectiveness of an intervention on the basis of CIs.
| Methods|| |
We conducted an online literature search in the various databases including PubMed and Cochrane using a planned scheme, as depicted in [Figure 1]. The search terms included, “clinical importance,” “confidence interval,” “confidence level,” “point estimate,” and “statistical significance.” We identified 3211 articles from the database search. After removing 726 duplicate records, we screened 2485 articles for appropriateness for inclusion in this review. Finally, we excluded 2476 articles because the content was not relevant to this manuscript and included 9 articles that contained relevant information and illustrations for this review. We have provided some hypothetical examples to explain the numerical concepts, as a discussion of the complete calculations is beyond the scope of this article.
|Figure 1: The flow diagram depicting the search and selection process of the articles selected for inclusion in the review article on the controversies surrounding the confidence interval|
Click here to view
| Defining CI|| |
A clinical study is conceived from the idea that an intervention has an effect. The basic statistical assumption called the null hypothesis assumes that there is no difference between the test and control arms. A statistical analysis checks the likelihood of finding a difference due to chance and whether, in reality, there are any differences between the comparator arms. Statistical tests allow us to reject or fail to reject the null hypothesis on the basis of the statistical significance of the obtained difference in the outcomes. Most simple statistical interpretations do not include various important factors such as clinical significance, precision of the estimate, and statistical power. A point estimate is the observed difference or ratio detected in the different arms of a study population. The CI of a study result is the range of values expected to contain the true population value with a defined degree of certainty (confidence level). Simply stated, if we repeat the study with random samples from the same population, then a defined percentage of the calculated CIs is likely to contain the true value. The confidence level is usually arbitrarily selected by the researcher as 95%, and rarely as 90% or 99%. The two extremes of the CI are called the limits or bounds. The CIs can be one or two tailed. A two-tailed CI describes the population parameter from both the lower and upper bounds. A one-sided CI describes a boundary for the lower or upper bound only. The CI is expressed as a range of observations. By convention, the two extremes of the interval are generally separated by a comma, a dash, or the word “to” and are contained in brackets.
| Origin|| |
The concept of CIs was devised by the Polish mathematician and statistician, Jerzy Neyman, in the year 1930. He proposed that in a large population, the calculated estimate of a sample parameter cannot be assumed to be exactly equal to that of the population parameter. Hence, there is a need to derive an estimation of the limits “between which the true value presumably falls.” He named the interval the CI. There are two computational methods of CI estimation, the Bayesian and the Frequentist.
| How to Calculate the CI|| |
In a sample from a population, a point estimate of the parameter of interest is derived. Standard error (SE) reflects the precision of the estimate. Probability distribution of the population is used to calculate a CI for the parameter. The CI usually extends to both sides of the estimate value by some multiple of the SE.
CI is calculated using the following formula:
CI = PE ± margin of error
PE ± critical value (z) × SE of PE
where CI = confidence interval, PE = point estimate, z = critical value, and SE = standard error.
The point estimate is the value calculated from the sample data. The critical value (z) varies by the confidence level and is applicable to the normal distribution curve. For confidence levels of 90%, 95%, and 99%, the corresponding z values are 1.65, 1.96, and 2.58, respectively. The SE is inversely proportional to the sample size and directly proportional to the variability of the data.
CI of the mean is calculated as follows:
CI = sample mean ± z value × SE of mean (SEM)
Sample mean ± z value × (SD/√n)
where CI = confidence interval, z = critical value, SE = standard error, SD = standard deviation, and n = sample size.
As the population standard deviation is unknown, the sample standard deviation is used as a surrogate of the population standard deviation in a fairly large (sample size at least >30) random sample.
If the sample is small and the population standard deviation is unknown, the CI is derived using the t distribution rather than the normal distribution. Here, we use the critical value of the t distribution with (n – 1) degrees of freedom. Readily available online or offline t distribution tables can be used for the calculation of CI.
Unlike continuous variables, categorical variables are presented as counts or proportions. The formula for CI of a proportion is
CI = sample proportion (p) ± z value × SE of proportion
where CI = confidence interval, z = critical value, and SE = standard error.
Distribution-free statistics is described by statistical methods where we make no assumptions about the probability distributions of the parameters being assessed. We cannot determine the SE while applying distribution-free statistics. In such a scenario, the calculated confidence bounds are not necessarily equidistant from the sample estimate and are denoted by actual observations in the sample. For example, in a dataset of 100 observations, the median will be the average of the 50th and 51st values. Hence, the lower 95% confidence bound will be the 40th rank ordered observation and the upper 95% confidence bound will be the 61st rank ordered observation.
Bootstrapping and Jackknifing
Bootstrapping and Jackknifing are computer-based algorithms. Bootstrapping is an artificial simulation process used to calculate the CI for a variable without making assumptions about the probability distribution of its sample estimate. A minimum of 1000 random hypothetical samples are generated from the original sample. Keeping the sample size constant, individual values are replaced by the same or other values from the sample. Now every sample yields an estimate of the parameter, and the distribution of these estimates is used to calculate the CI. In Jackknifing, by using a different method, random samples are generated from the original sample by removing one value each time, deriving n estimates of the variable from the original sample of size n. These estimates are used to derive the CI.
| Interpretation|| |
Conventionally, a P value boundary of 0.05 is taken as the cutoff for statistical significance. Although originally developed as an arbitrary value, it is often misused in many places to dogmatically label an intervention as successful or futile dichotomously, as highlighted in the last review. Similarly, while using the CIs to determine statistical significance, the “no-effect” cutoff occurs when the CI touches the line of 1 for relative risks or odds ratios and 0 for absolute risks and weighted mean differences.
CI is the measure of uncertainty around the point estimate. The SE accounts for the sampling error that creeps in due to chance alone. There are other possible confounders, for example, reliability of probability models, randomness, and quality of sampling. Unfortunately, inappropriate study conclusions due to such uncertainty may lead to years of unintentional harmful medical practices. Nevertheless, the point effect is an indicator of the effect size and the CI suggests the precision of that effect size. The width of the CI, confidence level, and distance of its lower or upper bound from the no-effect boundary are three important points a clinician should look for. Suppose a new molecule in a study (study A) on 400 patients showed a 30% better response rate compared to placebo. A 95% CI of 20% to 40% means one can be 95% confident that the true difference is between 20% and 40%. The same molecule when tested in another study (study B) on 200 patients showed the same difference of 30% from placebo, but with a broader 95% CI of −5% to +45%. As this CI includes 0, there is a higher than 5% chance that the drug does not result in a better response than placebo.
| Factors Affecting the Width of the CI|| |
Precision is one of the most crucial aspects of any information. The width of the CI decides the practical applicability of the estimated value. Imagine that a pharmaceutical company representative comes to your office to inform you about the launch of a new molecule in India. You are very excited and ask him what the tentative price of the molecule is. He applies the concept of 99% CI and guesses that the price will be somewhere between ₹10,000 and ₹10,00,000 per month. Even if he is highly unlikely to be wrong, you will be confused as to what to tell the patient you plan to treat with that molecule. Conversely, if he guesses that the price will be between ₹1,00,000 and ₹2,00,000 on the basis of 95% CI, he is relatively more likely to be wrong, but this information is more useful to you. The confidence level, the sample size, and the variability in the sample are the three factors influencing the width of CI. Higher confidence level requires a wider CI. Larger the sample size, narrower is the CI. Higher the variability of the data, broader will be the CI.
| Uses|| |
- Population estimates derived from a given sample are influenced by randomness. The CI is a tool which provides the best estimate of the range of possible values that can be obtained from a study. It is expected to contain the true population value in a certain percentage of the samples.
- The width of the CI is a measure of the precision of a study.
- The use of the CI supplements the P value by providing an estimate of the actual clinical effect.
- Recently, clinical studies are designed specifically as superiority, non-inferiority, or equivalence trials. Researchers base their conclusions regarding the comparison between the arms on the CIs rather than the P value.
- Bootstrapping and Jackknifing are the newer computer-based algorithms used in certain situations, for example, for generating and validating prognostic scores.
- The CI can be utilized to resolve controversies in medicine arising from P value–based conclusions.
| Fallacies|| |
The concept of the CI was introduced to address the uncertainty associated with randomness of the study sample and issues with study conclusions made entirely on the basis of statistical significance. However, the CI is not the panacea for all such issues. Despite being a descriptive statistical tool, the CI can be used to draw inferences regarding the population. There is no exact interpretation of the CI. The CI is often misinterpreted in various ways. Let us examine some common misinterpretations through the example of study A, earlier introduced in the section on interpretation of the CI.
The first incorrect interpretation could be that there is a 95% chance that the true difference in response rates lies between 20% and 40%.
The second incorrect interpretation could be that 95% of all the possible differences in response rates in the population lie between 20% and 40%.
The third incorrect interpretation could be that 95% of all study data values of the differences in response rates fall within the 20%–40% range.
The fourth incorrect interpretation could be thinking that the CI is the only possible source of error. There could be other sources of error, including an incorrect study design, biases in sample selection, data collection, and conduct of the study.
The fifth fallacy is that the decision of a confidence limit is arbitrary. By convention, it is chosen as 95%.
The most acceptable interpretation of 95% CI is that if random samples of the same size are taken repetitively from the population, from which all the 95% CI values are obtained, only 95% samples are likely to contain the true population parameter value.
| Clinical Implications|| |
When a new intervention becomes available, what an oncologist is really interested in is its clinical importance. Generally, researchers focus on statistical significance in the conclusions of study manuscripts to prove their point. The CI acts as a bridge between statistical significance and clinical importance. It is well known that the P value denotes the probability of any result occurring by chance. Conventionally, a P value of less than 0.05 is taken to denote a finding that is “statistically significant” and a P value of less than 0.01 is interpreted as denoting a finding that is “statistically highly significant.” A non-significant P value indicates either a lack of difference between the arms or an inadequate sample size. Based on the P value, trials have been dichotomously labeled as positive or negative. However, two important aspects that potentially can increase the clinical utility of study results include the strength of the evidence and whether the study is definitive or needs further confirmation. The CI addresses both these aspects. Interestingly, one can interpret the CI with respect to hypothesis testing. When the bounds of 95% CI do not cross the no-effect boundary, this indicates a statistically significant result with a 0.05 alpha level in a hypothesis test.
For example, in a trial comparing cytotoxic chemotherapy to immunotherapy in patients with metastatic urothelial carcinoma, the 2-year survival from immunotherapy was 5% better than from chemotherapy with 95% CI of −1.4% to +11%. As the lower bound crossed the no-effect boundary, the dichotomous interpretation might label it as a negative trial. Nevertheless, to decide the strength of evidence, we need to dissect this information further. Despite the lower bound crossing 0, the real difference is more likely to be close to the mathematical median (say 5) rather than to either of the bounds. A more clinically relevant conclusion could be that immunotherapy is a better choice than chemotherapy in patients with metastatic urothelial carcinoma, although the strength of the evidence is weak. Next, we need to look at whether the result is definitive or requires further studies. It is well established that the larger the sample size, the narrower will be the CI. In a so-called negative trial, a narrow CI in a repeat study with a larger sample size may pull the lower bound away from 0, but it will also pull the upper bound toward the point estimate. Hence, in order to determine whether a repeat trial with a bigger sample size is needed, one needs to decide whether the upper bound of the CI is clinically significant. As with the 95% confidence level, there are only 2.5% chances that the real value will be equal to or larger than the upper bound of the CI. If the upper bound is not clinically significant, it can be concluded that the trial is negative, and the result can be considered definitive. Alternatively, if the upper bound of the 95% CI is clinically significant, it can be concluded that the trial is negative, but not definitive, and is in need of confirmation in a larger trial.
Another way to evaluate the clinical importance of the observed difference between two trial arms is to compare statistical significance and effect sizes. If there is no overlap on comparing the 95% CI of the point estimates of the results in the two different arms, the difference can be considered statistically significant. Conversely, even a small overlap indicates that the difference may not be clinically significant, even if it is statistically significant. Hence, the CI converts a qualitative decision to a quantitative estimation of the clinical importance of effect.
The CI allows accurate comparison between the arms in various types of studies, including superiority, non-inferiority, and equivalence trials. To do this, researchers set predefined margins based on clinical relevance before calculating the sample size. The bounds of the CI of the outcome variable estimate for the test intervention must fall completely within the range of a predefined equivalence margin (generally taken as 80%–125%) on both sides of the line of no effect for demonstrating equivalence. For non-inferiority studies, the lower limit of the 95% CI for the test intervention must not cross the predetermined non-inferiority margin to demonstrate non-inferiority. Similarly, in superiority studies, the lower limit of the 95% CI for the test intervention must lie away from the point of no effect, while the upper limit crosses the superiority margin decided previously to demonstrate superiority.
| Remedies and Recommendations|| |
There are challenges with appropriate interpretation of the CI. Additionally, the CI is not a one-stop solution for all statistical dilemmas. To make the best clinical use of data involving the CI, the oncologists may be watchful of the following points:
- The CI is just another tool to draw appropriate inferences from the study data. It is not the final verdict. It is equally vulnerable to randomness and biases.
- The appropriate interpretation of the CI enables judicious clinical applicability.
- The clinician must read the Methods and Results sections of manuscripts thoroughly to avoid being misled by any biased conclusions.
- Researchers must follow the Consolidated Standards of Reporting Trials (CONSORT) guidelines while conducting and reporting clinical trials.
| Conclusion|| |
The CIs provide information related to clinical importance beyond the usual practice of statistical significance. The narrower the CI, better is the precision of the estimate. It is a reliable indicator about the location and size of the true value. The CI can be used with any kind of data. Rather than replacing the traditional ways of data presentation including the P value and hazard ratio, the CI complements the information provided by the standard measures and helps in better interpretation of the results.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Darling HS. To ”P”
or not to ”P”
, that is the question: A narrative review on P
value. Cancer Res Stat Treat 2021;4:756-62. [Full text]
McCormack, Vandermeer B, Allan GM. How confidence intervals become confusion intervals. BMC Med Res Methodol 2013;13:134.
Trkulja V, Hrabač P. Confidence intervals: What are they to us, medical doctors? Croat Med J 2019;60:375-82.
Hazra A. Using the confidence interval confidently. J Thorac Dis 2017;9:4125-30.
Darling HS. Basics of statistics – 2: Types of clinical studies. Cancer Res Stat Treat 2020;3:100-9. [Full text]
Neyman J. Fiducial argument and the theory of confidence intervals. Biometrika 1941;32:128-50.
Petrie A, Sabin C. Medical Statistics at a Glance. 3rd
ed. Chichester, West Sussex, PO198SQ, UK: Wiley Blackwell; 2009.
Taylor C. “Confidence Intervals: 4 Common Mistakes.” ThoughtCo, 2020. Available from: thoughtco.com/confidence-interval-mistakes-3126405.
Andrade C. P
values need to be correctly understood and read along with 95% confidence intervals. Cancer Res Stat Treat 2022;5:147-8. [Full text]
Greenhalgh T. How to Read a Paper: The Basics of Evidence-Based Medicine. 5th
ed. Chichester, West Sussex, PO198SQ, UK: Wiley Blackwell; 2014.
Schober P, Vetter T. Confidence intervals in clinical research. Anesth Analg 2020;130:1303.
Manjali JJ, Gupta T. Critical appraisal of a clinical research paper: What one needs to know. Cancer Res Stat Treat 2020;3:545-51. [Full text]