• Users Online: 235
  • Print this page
  • Email this page

Table of Contents
Year : 2019  |  Volume : 2  |  Issue : 1  |  Page : 108-111

Testing and interpreting assumptions of COX regression analysis

1 Department of Gynaecological Oncology, Sir H.N. Reliance Foundation Hospital and Research Centre, Mumbai, Maharashtra, India
2 Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra, India

Date of Web Publication9-Sep-2019

Correspondence Address:
Vijay Patil
Department of Medical Oncology, Tata Memorial Hospital, Mumbai, Maharashtra
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/CRST.CRST_40_19

Rights and Permissions

The COX regression analysis is like any statistical test that is based on multiple assumptions. This is a guide for how to test the assumptions and how to interpret the results.

Keywords: COX proportional hazard model, COX regression, statistics, survival analysis

How to cite this article:
Dessai S, Patil V. Testing and interpreting assumptions of COX regression analysis. Cancer Res Stat Treat 2019;2:108-11

How to cite this URL:
Dessai S, Patil V. Testing and interpreting assumptions of COX regression analysis. Cancer Res Stat Treat [serial online] 2019 [cited 2022 Aug 19];2:108-11. Available from: https://www.crstonline.com/text.asp?2019/2/1/108/266461

  Introduction Top

This is the third article in the statistical resource section for performing a survival analysis.[1],[2] Until now, we have discussed the method for estimating survival and methods to compare the survival between groups. The COX regression analysis, like any statistical test, is based on multiple assumptions.[3],[4] The following manuscript will detail these assumptions and explain the test used to test them. These assumptions should be tested routinely while performing COX regression analysis. A violation of these assumptions limits the applicability of COX regression analysis of the data.

  Random Censoring Top

It is assumed in COX regression analysis that the censoring in the data is random or noninformative. This means that patients following up at time “t” (dropping the patients who had an event and have been censored) would be a random sample from the entire study population. This is an important assumption and cannot be tested statistically. The only way to assure this is via vigorous collection of data. The concept of censoring with the details about the type of censoring was discussed in the previous statistical resource manuscript.[1]

  Proportional Hazard Assumption Top

While performing COX regression analysis, the focus remains on obtaining the hazard ratio with its 95% confidence interval. The hazard ratio provides the relative likelihood of an event happening in the experimental arm with respect to the standard arm. Speaking mathematically, this is the ratio of cumulative hazard rates, with hazard rate of the standard arm being in the denominator. Thus, the hazard ratio at time “t” can be depicted mathematically as shown below in Equation 1.

Hazard ratio at time “t” = (Cumulative hazard rate at time “t” in the experimental arm) ÷ (Cumulative hazard rate at time “t” in the standard arm) Equation 1

Hazard rate is the instantaneous probability of the occurrence of the event under consideration. It ignores the accumulation of hazard of occurrence of the event under consideration until that time point. The cumulative hazard rate until time “t” can be obtained by integrating the hazard rates until time “t.” The cumulative hazard rate at time “t” shown in the equation is this integrated value.

The proportional hazard assumption is that the hazard function (hazard ratio) for the two groups should remain proportional, which means that the hazard ratio is constant over time.[5] These assumptions should be tested prior to application of COX regression analysis routinely. The tests available to test this are

  1. Examination of the Kaplan–Meier curves: If the below-mentioned features are seen, then the probability of violation of this assumption is high

    1. There is a crossing of the Kaplan–Meier curves of the two groups
    2. The curve of one arm drops down, while the other plateaus.

  2. Scaled Schoenfeld residuals: These are statistical tests and graphical displays which check the proportional hazard assumption. Certainly, this test cannot be done in SPSS software Version 20.0 (IBM Corp., Armonk, NY), and hence, we need to use alternative software. The example mentioned below is given for performing this test in R software (R Core Team, Vienna, Austria).

For demonstrating this test, we will use the raw data of a recently reported study of low-dose gemcitabine versus standard-dose gemcitabine.[4] The progression-free survival (PFS) curves for low-dose gemcitabine and standard-dose gemcitabine crossed each other [Figure 1], and hence, the possibility of violation of the proportional hazard assumption was high.
Figure 1: Protocol-specified progression-free survival curve. Please note the curves are criss-crossing

Click here to view

The steps in R for doing this test are given below [Figure 2]:
Figure 2: Snapshot of commands required in R for performing Schoenfeld residuals

Click here to view

  1. Packages required-”rms,” “survminer.” These need to be loaded
  2. Command: CoxPFS<-coxph (Surv (PFS,Event_Progression_coded)~Arm). In this command, the words in black font need to be kept constant; these are part of the function. The words in blue font need to be changed as per the data analyzed. The term “PFS” in the command denotes the time to progression. In our current example, it is PFS, but it could be any time to event variables such as overall survival, disease-free survival, and locoregional control. Event_Progression_coded denotes the event which indicates whether progression has occurred or not. Unless specified otherwise, the software assumes that an event coded as “1” denotes that the event has happened. The term “Arm” stands for the arm to which the patient was randomized. This above command performs a COX regression analysis taking into account the arm as a variable. Multiple variables can be also included in the model. If multiple variables have to be included, then a “+” sign needs to be added before each variable. For example, if we were looking at the same command with age and gender included, the command would be coxPFS<-coxph (Surv[PFS,Event_Progression_coded]~Arm + Age + Gender)
  3. Command: Summary (coxPFS). Even though the above-mentioned command leads to the performance of a COX regression analysis, “the results are not displayed. Hence, this command is required
  4. Command: test.ph<-cox. zph(coxPFS, transform=“km,” global = TRUE). This is the command for testing the proportionality assumption. This command performs the proportionality test but does not display it
  5. Command: test.ph. This command displays the result
  6. The result displayed is rho chisq p ArmStd dose Gemcitabine-Carboplatin-0.0817 1.95 0.163
  7. The outcome of interest is the P value here. If P > 0.05, then the assumption is not violated
  8. The graphical display of these results can be performed by using the command: Ggcoxzph (test.ph). The display is shown in [Figure 3]. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot while the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the low-dose arm, while the other is for the standard dose arm. Please note how both the plots nearly represent horizontal lines
  9. Thus, in the dataset used, the proportional hazard assumption was not violated [Figure 1]. Hence, COX regression analysis can be applied for this data. Kaplan–Meier curves had crossed each other in the data set which means that the hazard rate varied between the two arms at different time intervals, but it did not violate the proportional hazard assumption. Crossing of curves assumes more significance if it happens in the early part of the curves rather than late. This is because, during the late part of the curve, the number of patients at-risk is low and a single event changes the hazard rate significantly
  10. When testing for multiple factors, as in the command which included arm, age, and gender: The Schoenfeld plots are made for each variable with above commands. The assumption of proportionality needs to be met by all variables
  11. If the assumptions are not met, then either an extended COX regression analysis or time-dependent COX regression analysis is required.
Figure 3: Schoenfeld residuals plot. In the figure, there is a solid line accompanied by two dashed lines. The solid line depicts the smoothing spline fit to the plot, and the dashed lines represent ± 2 standard error bands around the fit. The two red dotted lines above and below these lines represent the plot for the arms types, i.e., one is for the low-dose arm, while the other is for the standard-dose arm. Please note how both the plots nearly represent a horizontal line. This means the data do not violate the proportional hazard assumption

Click here to view

  Testing Nonlinearity Top

While performing a multivariate analysis for survival, when we introduce a continuous variable in the model, we assume that continuous covariates have a linear form. However, this assumption again needs to be confirmed. The assumption can be confirmed by plotting the Martingale residuals on the Y-axis against continuous covariates on the X-axis. The method for performing this is given below:

  1. We go back to the previous dataset of low-dose versus standard-dose gemcitabine study. Command: Resid<-residuals (coxPFS, type=“martingale”)
  2. Command: Plot (Age, resid, xlab=“Age,” ylab=“Martingale Residuals”). This command gives us [Figure 4]. The plot seen has to be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed. The most common transformation employed is logarithmic
  3. The interpretation of the plot is subjective, and this is a limitation of this method.
Figure 4: Martingale residuals plot. The plot seen must be horizontal and not angling. If it is angling, then the nonlinearity is not met, and the variable needs to be transformed

Click here to view

  Conclusion Top

The testing of assumptions is simple, can easily be performed in R software, and enables statistically correct interpretation of the data.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Chakraborty S. A step-wise guide to performing survival analysis. Cancer Res Stat Treat 2018;1:41.  Back to cited text no. 1
  [Full text]  
Dessai S, Simha V, Patil V. Stepwise cox regression analysis in SPSS. Cancer Res Stat Treat 2018;1:167.  Back to cited text no. 2
  [Full text]  
Patil VM, Noronha V, Joshi A, Agarwal J, Ghosh-Laskar S, Budrukkar A, et al. A randomized phase 3 trial comparing nimotuzumab plus cisplatin chemoradiotherapy versus cisplatin chemoradiotherapy alone in locally advanced head and neck cancer. Cancer 2019;1-14.  Back to cited text no. 3
Patil V, Noronha V, Joshi A, Chougule A, Kannan S, Bhattacharjee A, et al. Phase III non-inferiority study evaluating efficacy and safety of low dose gemcitabine compared to standard dose gemcitabine with platinum in advanced squamous lung cancer. EClinicalMedicine 2019;9:19-25.  Back to cited text no. 4
Cox DR. Analysis of Survival Data. Publisher Chapman and Hall, London-Newyork: Routledge; 2018.  Back to cited text no. 5


  [Figure 1], [Figure 2], [Figure 3], [Figure 4]

This article has been cited by
1 Oral metronomic chemotherapy after definitive chemoradiation in esophageal squamous cell carcinoma: a randomized clinical trial
V. Noronha, V. M. Patil, N. S. Menon, A. Joshi, S. Goud, S. More, S. Kannan, A. Pawar, D. Nakti, A. Yadav, S. Shah, A. Mahajan, A. Janu, R. Kumar, A. Tibdewal, N. Mummudi, J. P. Agarwal, S. D. Banavali, K. Prabhash
Esophagus. 2022;
[Pubmed] | [DOI]
2 When Less May Be Enough: Dose Selection Strategies for Immune Checkpoint Inhibitors Focusing on AntiPD-(L)1 Agents
Daniel V. Araujo, Bruno Uchoa, Juan José Soto-Castillo, Larissa L. Furlan, Marc Oliva
Targeted Oncology. 2022;
[Pubmed] | [DOI]
3 G8 and VES-13 as screening tools for geriatric assessment and predictors of survival in older Indian patients with cancer
Minit Shah, Vanita Noronha, Anant Ramaswamy, Shreya Gattani, Smruti Mokal, Amit Joshi, Vijay Patil, Nandini Menon, Shripad Banavali, Rajendra Badwe, Kumar Prabhash
Journal of Geriatric Oncology. 2022;
[Pubmed] | [DOI]
4 Does Machine Learning Offer Added Value Vis-ŕ-Vis Traditional Statistics? An Exploratory Study on Retirement Decisions Using Data from the Survey of Health, Ageing, and Retirement in Europe (SHARE)
Montserrat González Garibay, Andrej Srakar, Tjaša Bartolj, Jože Sambt
Mathematics. 2022; 10(1): 152
[Pubmed] | [DOI]
5 The role of chemotherapy in patients with small cell lung cancer and poor performance status
Vanita Noronha,Rahul Ravind,Vijay M. Patil,Smruti Mokal,Amit Joshi,Nandini Menon,Akhil Kapoor,Abhishek Mahajan,Amit Janu,Dipti Nakti,Leena Shah,Srushti Shah,Kumar Prabhash
Acta Oncologica. 2020; : 1
[Pubmed] | [DOI]
6 A machine learning analysis of a “normal-like” IDH-WT diffuse glioma transcriptomic subgroup associated with prolonged survival reveals novel immune and neurotransmitter-related actionable targets
H. D. Nguyen,A. Allaire,P. Diamandis,M. Bisaillon,M. S. Scott,M. Richer
BMC Medicine. 2020; 18(1)
[Pubmed] | [DOI]


    Similar in PUBMED
 Related articles
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
Random Censoring
Proportional Haz...
Testing Nonlinearity
Article Figures

 Article Access Statistics
    PDF Downloaded1248    
    Comments [Add]    
    Cited by others 6    

Recommend this journal