



STATISTICAL RESOURCE 

Year : 2019  Volume
: 2
 Issue : 1  Page : 108111 

Testing and interpreting assumptions of COX regression analysis
Sampada Dessai^{1}, Vijay Patil^{2}
^{1} Department of Gynaecological Oncology, Sir H.N. Reliance Foundation Hospital and Research Centre, Mumbai, Maharashtra, India ^{2} Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra, India
Date of Web Publication  9Sep2019 
Correspondence Address: Vijay Patil Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra India
Source of Support: None, Conflict of Interest: None  3 
DOI: 10.4103/CRST.CRST_40_19
The COX regression analysis is like any statistical test that is based on multiple assumptions. This is a guide for how to test the assumptions and how to interpret the results.
Keywords: COX proportional hazard model, COX regression, statistics, survival analysis
How to cite this article: Dessai S, Patil V. Testing and interpreting assumptions of COX regression analysis. Cancer Res Stat Treat 2019;2:10811 
Introduction   
This is the third article in the statistical resource section for performing a survival analysis.^{[1],[2]} Until now, we have discussed the method for estimating survival and methods to compare the survival between groups. The COX regression analysis, like any statistical test, is based on multiple assumptions.^{[3],[4]} The following manuscript will detail these assumptions and explain the test used to test them. These assumptions should be tested routinely while performing COX regression analysis. A violation of these assumptions limits the applicability of COX regression analysis of the data.
Random Censoring   
It is assumed in COX regression analysis that the censoring in the data is random or noninformative. This means that patients following up at time “t” (dropping the patients who had an event and have been censored) would be a random sample from the entire study population. This is an important assumption and cannot be tested statistically. The only way to assure this is via vigorous collection of data. The concept of censoring with the details about the type of censoring was discussed in the previous statistical resource manuscript.^{[1]}
Proportional Hazard Assumption   
While performing COX regression analysis, the focus remains on obtaining the hazard ratio with its 95% confidence interval. The hazard ratio provides the relative likelihood of an event happening in the experimental arm with respect to the standard arm. Speaking mathematically, this is the ratio of cumulative hazard rates, with hazard rate of the standard arm being in the denominator. Thus, the hazard ratio at time “t” can be depicted mathematically as shown below in Equation 1.
Hazard ratio at time “t” = (Cumulative hazard rate at time “t” in the experimental arm) ÷ (Cumulative hazard rate at time “t” in the standard arm) Equation 1
Hazard rate is the instantaneous probability of the occurrence of the event under consideration. It ignores the accumulation of hazard of occurrence of the event under consideration until that time point. The cumulative hazard rate until time “t” can be obtained by integrating the hazard rates until time “t.” The cumulative hazard rate at time “t” shown in the equation is this integrated value.
The proportional hazard assumption is that the hazard function (hazard ratio) for the two groups should remain proportional, which means that the hazard ratio is constant over time.^{[5]} These assumptions should be tested prior to application of COX regression analysis routinely. The tests available to test this are
 Examination of the Kaplan–Meier curves: If the belowmentioned features are seen, then the probability of violation of this assumption is high
 There is a crossing of the Kaplan–Meier curves of the two groups
 The curve of one arm drops down, while the other plateaus.
 Scaled Schoenfeld residuals: These are statistical tests and graphical displays which check the proportional hazard assumption. Certainly, this test cannot be done in SPSS software Version 20.0 (IBM Corp., Armonk, NY), and hence, we need to use alternative software. The example mentioned below is given for performing this test in R software (R Core Team, Vienna, Austria).
For demonstrating this test, we will use the raw data of a recently reported study of lowdose gemcitabine versus standarddose gemcitabine.^{[4]} The progressionfree survival (PFS) curves for lowdose gemcitabine and standarddose gemcitabine crossed each other [Figure 1], and hence, the possibility of violation of the proportional hazard assumption was high.  Figure 1: Protocolspecified progressionfree survival curve. Please note the curves are crisscrossing
Click here to view 
The steps in R for doing this test are given below [Figure 2]:  Figure 2: Snapshot of commands required in R for performing Schoenfeld residuals
Click here to view 
 Packages required”rms,” “survminer.” These need to be loaded
 Command: CoxPFS<coxph (Surv (PFS,Event_Progression_coded)~Arm). In this command, the words in black font need to be kept constant; these are part of the function. The words in blue font need to be changed as per the data analyzed. The term “PFS” in the command denotes the time to progression. In our current example, it is PFS, but it could be any time to event variables such as overall survival, diseasefree survival, and locoregional control. Event_Progression_coded denotes the event which indicates whether progression has occurred or not. Unless specified otherwise, the software assumes that an event coded as “1” denotes that the event has happened. The term “Arm” stands for the arm to which the patient was randomized. This above command performs a COX regression analysis taking into account the arm as a variable. Multiple variables can be also included in the model. If multiple variables have to be included, then a “+” sign needs to be added before each variable. For example, if we were looking at the same command with age and gender included, the command would be coxPFS<coxph (Surv[PFS,Event_Progression_coded]~Arm + Age + Gender)
 Command: Summary (coxPFS). Even though the abovementioned command leads to the performance of a COX regression analysis, “the results are not displayed. Hence, this command is required
 Command: test.ph<cox. zph(coxPFS, transform=“km,” global = TRUE). This is the command for testing the proportionality assumption. This command performs the proportionality test but does not display it
 Command: test.ph. This command displays the result
 The result displayed is rho chisq p ArmStd dose GemcitabineCarboplatin0.0817 1.95 0.163
 The outcome of interest is the P value here. If P > 0.05, then the assumption is not violated
 The graphical display of these results can be performed by using the command: Ggcoxzph (test.ph). The display is shown in [Figure 3]. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot while the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the lowdose arm, while the other is for the standard dose arm. Please note how both the plots nearly represent horizontal lines
 Thus, in the dataset used, the proportional hazard assumption was not violated [Figure 1]. Hence, COX regression analysis can be applied for this data. Kaplan–Meier curves had crossed each other in the data set which means that the hazard rate varied between the two arms at different time intervals, but it did not violate the proportional hazard assumption. Crossing of curves assumes more significance if it happens in the early part of the curves rather than late. This is because, during the late part of the curve, the number of patients atrisk is low and a single event changes the hazard rate significantly
 When testing for multiple factors, as in the command which included arm, age, and gender: The Schoenfeld plots are made for each variable with above commands. The assumption of proportionality needs to be met by all variables
 If the assumptions are not met, then either an extended COX regression analysis or timedependent COX regression analysis is required.
 Figure 3: Schoenfeld residuals plot. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot, and the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the lowdose arm, while the other is for the standarddose arm. Please note how both the plots nearly represent a horizontal line. This means the data do not violate the proportional hazard assumption
Click here to view 
Testing Nonlinearity   
While performing a multivariate analysis for survival, when we introduce a continuous variable in the model, we assume that continuous covariates have a linear form. However, this assumption again needs to be confirmed. The assumption can be confirmed by plotting the Martingale residuals on the Yaxis against continuous covariates on the Xaxis. The method for performing this is given below:
 We go back to the previous dataset of lowdose versus standarddose gemcitabine study. Command: Resid<residuals (coxPFS, type=“martingale”)
 Command: Plot (Age, resid, xlab=“Age,” ylab=“Martingale Residuals”). This command gives us [Figure 4]. The plot seen has to be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed. The most common transformation employed is logarithmic
 The interpretation of the plot is subjective, and this is a limitation of this method.
 Figure 4: Martingale residuals plot. The plot seen must be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed
Click here to view 
Conclusion   
The testing of assumptions is simple, can easily be performed in R software, and enables statistically correct interpretation of the data.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Chakraborty S. A stepwise guide to performing survival analysis. Cancer Res Stat Treat 2018;1:41. [Full text] 
2.  Dessai S, Simha V, Patil V. Stepwise cox regression analysis in SPSS. Cancer Res Stat Treat 2018;1:167. [Full text] 
3.  Patil VM, Noronha V, Joshi A, Agarwal J, GhoshLaskar S, Budrukkar A, et al. A randomized phase 3 trial comparing nimotuzumab plus cisplatin chemoradiotherapy versus cisplatin chemoradiotherapy alone in locally advanced head and neck cancer. Cancer 2019;114. 
4.  Patil V, Noronha V, Joshi A, Chougule A, Kannan S, Bhattacharjee A, et al. Phase III noninferiority study evaluating efficacy and safety of low dose gemcitabine compared to standard dose gemcitabine with platinum in advanced squamous lung cancer. EClinicalMedicine 2019;9:1925. 
5.  Cox DR. Analysis of Survival Data. Publisher Chapman and Hall, LondonNewyork: Routledge; 2018. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
This article has been cited by  1 
Does Machine Learning Offer Added Value VisŕVis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE) 

 Montserrat González Garibay, Andrej Srakar, Tjaša Bartolj, Jože Sambt   Mathematics. 2022; 10(1): 152   [Pubmed]  [DOI]   2 
The role of chemotherapy in patients with small cell lung cancer and poor performance status 

 Vanita Noronha,Rahul Ravind,Vijay M. Patil,Smruti Mokal,Amit Joshi,Nandini Menon,Akhil Kapoor,Abhishek Mahajan,Amit Janu,Dipti Nakti,Leena Shah,Srushti Shah,Kumar Prabhash   Acta Oncologica. 2020; : 1   [Pubmed]  [DOI]   3 
A machine learning analysis of a “normallike” IDHWT diffuse glioma transcriptomic subgroup associated with prolonged survival reveals novel immune and neurotransmitterrelated actionable targets 

 H. D. Nguyen,A. Allaire,P. Diamandis,M. Bisaillon,M. S. Scott,M. Richer   BMC Medicine. 2020; 18(1)   [Pubmed]  [DOI]  



