|Year : 2019 | Volume
| Issue : 1 | Page : 108-111
Testing and interpreting assumptions of COX regression analysis
Sampada Dessai1, Vijay Patil2
1 Department of Gynaecological Oncology, Sir H.N. Reliance Foundation Hospital and Research Centre, Mumbai, Maharashtra, India
2 Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra, India
|Date of Web Publication||9-Sep-2019|
Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra
Source of Support: None, Conflict of Interest: None
The COX regression analysis is like any statistical test that is based on multiple assumptions. This is a guide for how to test the assumptions and how to interpret the results.
Keywords: COX proportional hazard model, COX regression, statistics, survival analysis
|How to cite this article:|
Dessai S, Patil V. Testing and interpreting assumptions of COX regression analysis. Cancer Res Stat Treat 2019;2:108-11
| Introduction|| |
This is the third article in the statistical resource section for performing a survival analysis., Until now, we have discussed the method for estimating survival and methods to compare the survival between groups. The COX regression analysis, like any statistical test, is based on multiple assumptions., The following manuscript will detail these assumptions and explain the test used to test them. These assumptions should be tested routinely while performing COX regression analysis. A violation of these assumptions limits the applicability of COX regression analysis of the data.
| Random Censoring|| |
It is assumed in COX regression analysis that the censoring in the data is random or noninformative. This means that patients following up at time “t” (dropping the patients who had an event and have been censored) would be a random sample from the entire study population. This is an important assumption and cannot be tested statistically. The only way to assure this is via vigorous collection of data. The concept of censoring with the details about the type of censoring was discussed in the previous statistical resource manuscript.
| Proportional Hazard Assumption|| |
While performing COX regression analysis, the focus remains on obtaining the hazard ratio with its 95% confidence interval. The hazard ratio provides the relative likelihood of an event happening in the experimental arm with respect to the standard arm. Speaking mathematically, this is the ratio of cumulative hazard rates, with hazard rate of the standard arm being in the denominator. Thus, the hazard ratio at time “t” can be depicted mathematically as shown below in Equation 1.
Hazard ratio at time “t” = (Cumulative hazard rate at time “t” in the experimental arm) ÷ (Cumulative hazard rate at time “t” in the standard arm) Equation 1
Hazard rate is the instantaneous probability of the occurrence of the event under consideration. It ignores the accumulation of hazard of occurrence of the event under consideration until that time point. The cumulative hazard rate until time “t” can be obtained by integrating the hazard rates until time “t.” The cumulative hazard rate at time “t” shown in the equation is this integrated value.
The proportional hazard assumption is that the hazard function (hazard ratio) for the two groups should remain proportional, which means that the hazard ratio is constant over time. These assumptions should be tested prior to application of COX regression analysis routinely. The tests available to test this are
- Examination of the Kaplan–Meier curves: If the below-mentioned features are seen, then the probability of violation of this assumption is high
- There is a crossing of the Kaplan–Meier curves of the two groups
- The curve of one arm drops down, while the other plateaus.
- Scaled Schoenfeld residuals: These are statistical tests and graphical displays which check the proportional hazard assumption. Certainly, this test cannot be done in SPSS software Version 20.0 (IBM Corp., Armonk, NY), and hence, we need to use alternative software. The example mentioned below is given for performing this test in R software (R Core Team, Vienna, Austria).
For demonstrating this test, we will use the raw data of a recently reported study of low-dose gemcitabine versus standard-dose gemcitabine. The progression-free survival (PFS) curves for low-dose gemcitabine and standard-dose gemcitabine crossed each other [Figure 1], and hence, the possibility of violation of the proportional hazard assumption was high.
|Figure 1: Protocol-specified progression-free survival curve. Please note the curves are criss-crossing|
Click here to view
The steps in R for doing this test are given below [Figure 2]:
|Figure 2: Snapshot of commands required in R for performing Schoenfeld residuals|
Click here to view
- Packages required-”rms,” “survminer.” These need to be loaded
- Command: CoxPFS<-coxph (Surv (PFS,Event_Progression_coded)~Arm). In this command, the words in black font need to be kept constant; these are part of the function. The words in blue font need to be changed as per the data analyzed. The term “PFS” in the command denotes the time to progression. In our current example, it is PFS, but it could be any time to event variables such as overall survival, disease-free survival, and locoregional control. Event_Progression_coded denotes the event which indicates whether progression has occurred or not. Unless specified otherwise, the software assumes that an event coded as “1” denotes that the event has happened. The term “Arm” stands for the arm to which the patient was randomized. This above command performs a COX regression analysis taking into account the arm as a variable. Multiple variables can be also included in the model. If multiple variables have to be included, then a “+” sign needs to be added before each variable. For example, if we were looking at the same command with age and gender included, the command would be coxPFS<-coxph (Surv[PFS,Event_Progression_coded]~Arm + Age + Gender)
- Command: Summary (coxPFS). Even though the above-mentioned command leads to the performance of a COX regression analysis, “the results are not displayed. Hence, this command is required
- Command: test.ph<-cox. zph(coxPFS, transform=“km,” global = TRUE). This is the command for testing the proportionality assumption. This command performs the proportionality test but does not display it
- Command: test.ph. This command displays the result
- The result displayed is rho chisq p ArmStd dose Gemcitabine-Carboplatin-0.0817 1.95 0.163
- The outcome of interest is the P value here. If P > 0.05, then the assumption is not violated
- The graphical display of these results can be performed by using the command: Ggcoxzph (test.ph). The display is shown in [Figure 3]. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot while the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the low-dose arm, while the other is for the standard dose arm. Please note how both the plots nearly represent horizontal lines
- Thus, in the dataset used, the proportional hazard assumption was not violated [Figure 1]. Hence, COX regression analysis can be applied for this data. Kaplan–Meier curves had crossed each other in the data set which means that the hazard rate varied between the two arms at different time intervals, but it did not violate the proportional hazard assumption. Crossing of curves assumes more significance if it happens in the early part of the curves rather than late. This is because, during the late part of the curve, the number of patients at-risk is low and a single event changes the hazard rate significantly
- When testing for multiple factors, as in the command which included arm, age, and gender: The Schoenfeld plots are made for each variable with above commands. The assumption of proportionality needs to be met by all variables
- If the assumptions are not met, then either an extended COX regression analysis or time-dependent COX regression analysis is required.
|Figure 3: Schoenfeld residuals plot. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot, and the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the low-dose arm, while the other is for the standard-dose arm. Please note how both the plots nearly represent a horizontal line. This means the data do not violate the proportional hazard assumption|
Click here to view
| Testing Nonlinearity|| |
While performing a multivariate analysis for survival, when we introduce a continuous variable in the model, we assume that continuous covariates have a linear form. However, this assumption again needs to be confirmed. The assumption can be confirmed by plotting the Martingale residuals on the Y-axis against continuous covariates on the X-axis. The method for performing this is given below:
- We go back to the previous dataset of low-dose versus standard-dose gemcitabine study. Command: Resid<-residuals (coxPFS, type=“martingale”)
- Command: Plot (Age, resid, xlab=“Age,” ylab=“Martingale Residuals”). This command gives us [Figure 4]. The plot seen has to be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed. The most common transformation employed is logarithmic
- The interpretation of the plot is subjective, and this is a limitation of this method.
|Figure 4: Martingale residuals plot. The plot seen must be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed|
Click here to view
| Conclusion|| |
The testing of assumptions is simple, can easily be performed in R software, and enables statistically correct interpretation of the data.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Chakraborty S. A step-wise guide to performing survival analysis. Cancer Res Stat Treat 2018;1:41. [Full text]
Dessai S, Simha V, Patil V. Stepwise cox regression analysis in SPSS. Cancer Res Stat Treat 2018;1:167. [Full text]
Patil VM, Noronha V, Joshi A, Agarwal J, Ghosh-Laskar S, Budrukkar A, et al.
A randomized phase 3 trial comparing nimotuzumab plus cisplatin chemoradiotherapy versus cisplatin chemoradiotherapy alone in locally advanced head and neck cancer. Cancer 2019;1-14.
Patil V, Noronha V, Joshi A, Chougule A, Kannan S, Bhattacharjee A, et al.
Phase III non-inferiority study evaluating efficacy and safety of low dose gemcitabine compared to standard dose gemcitabine with platinum in advanced squamous lung cancer. EClinicalMedicine 2019;9:19-25.
Cox DR. Analysis of Survival Data. Publisher Chapman and Hall, London-Newyork: Routledge; 2018.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
|This article has been cited by|
||Does Machine Learning Offer Added Value Vis-ŕ-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE)
| ||Montserrat González Garibay, Andrej Srakar, Tjaša Bartolj, Jože Sambt |
| ||Mathematics. 2022; 10(1): 152 |
|[Pubmed] | [DOI]|
||The role of chemotherapy in patients with small cell lung cancer and poor performance status
| ||Vanita Noronha,Rahul Ravind,Vijay M. Patil,Smruti Mokal,Amit Joshi,Nandini Menon,Akhil Kapoor,Abhishek Mahajan,Amit Janu,Dipti Nakti,Leena Shah,Srushti Shah,Kumar Prabhash |
| ||Acta Oncologica. 2020; : 1 |
|[Pubmed] | [DOI]|
||A machine learning analysis of a “normal-like” IDH-WT diffuse glioma transcriptomic subgroup associated with prolonged survival reveals novel immune and neurotransmitter-related actionable targets
| ||H. D. Nguyen,A. Allaire,P. Diamandis,M. Bisaillon,M. S. Scott,M. Richer |
| ||BMC Medicine. 2020; 18(1) |
|[Pubmed] | [DOI]|