_images/logo.png

Competing risks models

What are competing risks models?

Competing risks models are a combination of two or more distributions that represent failure modes which are “competing” to end the life of the system being modelled. This model is similar to a mixture model in the sense that it uses multiple distributions to create a new model that has a shape with more flexibility than a single distribution. However, unlike in mixture models, we are not adding proportions of the PDF or CDF, but are instead multiplying the survival functions. The formula for the competing risks model is typically written in terms of the survival function (SF). Since we may consider the system’s reliability to depend on the reliability of all the parts of the system (each with its own failure modes), the equation is written as if the system was in series, using the product of the survival functions for each failure mode. For a competing risks model with 2 distributions, the equations are shown below:

\({SF}_{Competing\,Risks} = {SF}_1 \times {SF}_2\)

\({CDF}_{Competing\,Risks} = 1-{SF}_{Competing\,Risks}\)

Since \({SF} = exp(-CHF)\) we may equivalently write the competing risks model in terms of the hazard or cumulative hazard function as:

\({HF}_{Competing\,Risks} = {HF}_1 + {HF}_2\)

\({CHF}_{Competing\,Risks} = {CHF}_1 + {CHF}_2\)

\({PDF}_{Competing\,Risks} = {HF}_{Competing\,Risks} \times {SF}_{Competing\,Risks}\)

The image below illustrates the difference between the competing risks model and the mixture model, each of which is made up of the same two component distributions. Note that the PDF of the competing risks model is always equal to or to the left of the component distributions, and the CDF is equal to or higher than the component distributions. This shows how a failure mode that occurs earlier in time can end the lives of units under observation before the second failure mode has the chance to. This behaviour is characteristic of real systems which experience multiple failure modes, each of which could cause system failure.

_images/CRvsMM1.png

Competing risks models are useful when there is more than one failure mode that is generating the failure data. This can be recognised by the shape of the PDF and CDF being outside of what any single distribution can accurately model. On a probability plot, a combination of failure modes can be identified by bends in the data that you might otherwise expect to be linear. An example of this is shown in the image below. You should not use a competing risks model just because it fits your data better than a single distribution, but you should use a competing risks model if you suspect that there are multiple failure modes contributing to the failure data you are observing. To judge whether a competing risks model is justified, look at the goodness of fit criterion (AICc or BIC) which penalises the score based on the number of parameters in the model. The closer the goodness of fit criterion is to zero, the better the fit. It is not appropriate to use the Log-likelihood or AD goodness of fit criterions as these do not penalise the score based on the number of parameters, therefore making the model susceptible to overfitting.

See also mixture models for another method of combining distributions using the sum of the CDF rather than the product of the SF.

_images/CRprobplot1.png

Creating a competing risks model

Within reliability.Distributions is the Competing_Risks_Model. This function accepts an array or list of distribution objects created using the reliability.Distributions module (available distributions are Exponential, Weibull, Gumbel, Normal, Lognormal, Loglogistic, Gamma, Beta). There is no limit to the number of components you can add to the model, but it is generally preferable to use as few as are required to fit the data appropriately (typically 2 or 3). Unlike the mixture model, you do not need to specify any proportions.

As this process is multiplicative for the survival function (or additive for the hazard function), and may accept many distributions of different types, the mathematical formulation quickly gets complex. For this reason, the algorithm combines the models numerically rather than empirically so there are no simple formulas for many of the descriptive statistics (mean, median, etc.). Also, the accuracy of the model is dependent on xvals. If the xvals array is small (<100 values) then the answer will be “blocky” and inaccurate. The variable xvals is only accepted for PDF, CDF, SF, HF, and CHF. The other methods (like random samples) use the default xvals for maximum accuracy. The default number of values generated when xvals is not given is 1000. Consider this carefully when specifying xvals in order to avoid inaccuracies in the results.

The API is similar to the other probability distributions (Weibull, Normal, etc.) and has the following inputs and methods:

Inputs:

  • distributions - a list or array of probability distributions used to construct the model

Methods:

  • name - ‘Competing risks’
  • name2 - ‘Competing risks using 3 distributions’
  • mean
  • median
  • mode
  • variance
  • standard_deviation
  • skewness
  • kurtosis
  • excess_kurtosis
  • b5 - The time where 5% have failed. Same as quantile(0.05)
  • b95 - The time where 95% have failed. Same as quantile(0.95)
  • plot() - plots all functions (PDF,CDF,SF,HF,CHF)
  • PDF() - plots the probability density function
  • CDF() - plots the cumulative distribution function
  • SF() - plots the survival function (also known as reliability function)
  • HF() - plots the hazard function
  • CHF() - plots the cumulative hazard function
  • quantile() - Calculates the quantile (time until a fraction has failed) for a given fraction failing. Also known as b life where b5 is the time at which 5% have failed.
  • inverse_SF() - the inverse of the Survival Function. This is useful when producing QQ plots.
  • mean_residual_life() - Average residual lifetime of an item given that the item has survived up to a given time. Effectively the mean of the remaining amount (right side) of a distribution at a given time.
  • stats() - prints all the descriptive statistics. Same as the statistics shown using .plot() but printed to console.
  • random_samples() - draws random samples from the distribution to which it is applied.

Example 1

The following example shows how the Competing_Risks_Model object can be created, visualised and used.

from reliability.Distributions import Lognormal_Distribution, Gamma_Distribution, Weibull_Distribution, Competing_Risks_Model
import matplotlib.pyplot as plt

# create the competing risks model
d1 = Lognormal_Distribution(mu=4, sigma=0.1)
d2 = Weibull_Distribution(alpha=50, beta=2)
d3 = Gamma_Distribution(alpha=30,beta=1.5)
CR_model = Competing_Risks_Model(distributions=[d1, d2, d3])

# plot the 5 functions using the plot() function
CR_model.plot()

# plot the PDF and CDF
plot_components = True # this plots the component distributions. Default is False
plt.figure(figsize=(9, 5))
plt.subplot(121)
CR_model.PDF(plot_components=plot_components, color='red', linestyle='--')
plt.subplot(122)
CR_model.CDF(plot_components=plot_components, color='red', linestyle='--')
plt.show()

# extract the mean of the distribution
print('The mean of the distribution is:', CR_model.mean)

'''
The mean of the distribution is: 27.04449126273065
'''
_images/CR_model_plotV2.png _images/CR_model_PDF_CDFV2.png

Fitting a competing risks model

Within reliability.Fitters is Fit_Weibull_CR. This function will fit a Weibull Competing Risks Model consisting of 2 x Weibull_2P distributions (this does not fit the gamma parameter). Just as with all of the other distributions in reliability.Fitters, right censoring is supported.

Whilst some failure modes may not be fitted as well by a Weibull distribution as they may be by another distribution, it is unlikely that a competing risks model of data from two distributions (particularly if they are overlapping) will be fitted noticeably better by other types of competing risks models than would be achieved by a Weibull Competing Risks Model. For this reason, other types of competing risks models are not implemented.

Inputs:

  • failures - an array or list of the failure data. There must be at least 4 failures, but it is highly recommended to use another model if you have less than 20 failures.
  • right_censored - an array or list of right censored data. Optional input.
  • print_results - True/False. This will print results to console. Default is True.
  • CI - confidence interval for estimating confidence limits on parameters. Must be between 0 and 1. Default is 0.95 for 95% CI.
  • optimizer - ‘L-BFGS-B’, ‘TNC’, or ‘powell’. These are all bound constrained methods. If the bounded method fails, nelder-mead will be used. If nelder-mead fails then the initial guess will be returned with a warning. For more information on optimizers see scipy.
  • show_probability_plot - True/False. This will show the probability plot with the fitted Weibull_CR CDF. Default is True.

Outputs:

  • alpha_1 - the fitted Weibull_2P alpha parameter for the first distribution
  • beta_1 - the fitted Weibull_2P beta parameter for the first distribution
  • alpha_2 - the fitted Weibull_2P alpha parameter for the second distribution
  • beta_2 - the fitted Weibull_2P beta parameter for the second distribution
  • alpha_1_SE - the standard error on the parameter
  • beta_1_SE - the standard error on the parameter
  • alpha_2_SE - the standard error on the parameter
  • beta_2_SE - the standard error on the parameter
  • alpha_1_upper - the upper confidence interval estimate of the parameter
  • alpha_1_lower - the lower confidence interval estimate of the parameter
  • beta_1_upper - the upper confidence interval estimate of the parameter
  • beta_1_lower - the lower confidence interval estimate of the parameter
  • alpha_2_upper - the upper confidence interval estimate of the parameter
  • alpha_2_lower - the lower confidence interval estimate of the parameter
  • beta_2_upper - the upper confidence interval estimate of the parameter
  • beta_2_lower - the lower confidence interval estimate of the parameter
  • loglik - Log Likelihood (as used in Minitab and Reliasoft)
  • loglik2 - LogLikelihood*-2 (as used in JMP Pro)
  • AICc - Akaike Information Criterion
  • BIC - Bayesian Information Criterion
  • AD - Anderson-Darling goodness of fit statistic
  • results - a dataframe of the results (point estimate, standard error, Lower CI and Upper CI for each parameter)
  • goodness_of_fit - a dataframe of the goodness of fit values (Log-likelihood, AICc, BIC, AD).

Example 2

In this example, we will create some data using a competing risks model from two Weibull distributions. We will then fit the Weibull mixture model to the data and will print the results and show the plot.

from reliability.Distributions import Weibull_Distribution, Competing_Risks_Model
from reliability.Fitters import Fit_Weibull_CR
from reliability.Other_functions import histogram
import matplotlib.pyplot as plt

# create some data that requires a competing risks models
d1 = Weibull_Distribution(alpha=50, beta=2)
d2 = Weibull_Distribution(alpha=40, beta=10)
CR_model = Competing_Risks_Model(distributions=[d1, d2])
data = CR_model.random_samples(100, seed=2)

# fit the Weibull competing risks model
results = Fit_Weibull_CR(failures=data)

# this section is to visualise the histogram with PDF and CDF
# it is not part of the default output from the Fitter
plt.figure(figsize=(9, 5))
plt.subplot(121)
histogram(data)
results.distribution.PDF()
plt.subplot(122)
histogram(data, cumulative=True)
results.distribution.CDF()

plt.show()

'''
Results from Fit_Weibull_CR (95% CI):
Analysis method: MLE
Failures / Right censored: 100/0 (0% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
  Alpha 1         55.2695         14.3883   33.1812   92.0615
   Beta 1         1.89484        0.452994   1.18598   3.02738
  Alpha 2          38.175         1.07992    36.116   40.3514
   Beta 2         7.97514         1.18035   5.96701   10.6591

Goodness of fit    Value
 Log-likelihood -352.479
           AICc   713.38
            BIC  723.379
             AD 0.390325
'''
_images/CR_fit_probplot2.png _images/CR_fit_hist1.png

Example 3

In this example, we will compare the mixture model to the competing risks model. The data is generated from a competing risks model so we expect the Weibull competing risks model to be more appropriate than the Mixture model. Through comparison of the AICc or BIC, we can see which model is more appropriate. Since the AICc and BIC penalise the goodness of fit criterion based on the number of parameters and the mixture model has 5 parameters compared to the competing risk model’s 4 parameters, we expect the competing risks model to have a lower (closer to zero) goodness of fit than the Mixture model, and this is what we observe in the results. Notice how the log-likelihood and AD statistics of the mixture model indicates a better fit (because the value is closer to zero), but this does not take into account the number of parameters in the model.

from reliability.Distributions import Weibull_Distribution, Competing_Risks_Model
from reliability.Fitters import Fit_Weibull_CR, Fit_Weibull_Mixture
import matplotlib.pyplot as plt
import pandas as pd

# create some data from a competing risks model
d1 = Weibull_Distribution(alpha=250, beta=2)
d2 = Weibull_Distribution(alpha=210, beta=10)
CR_model = Competing_Risks_Model(distributions=[d1, d2])
data = CR_model.random_samples(50, seed=2)

CR_fit = Fit_Weibull_CR(failures=data)  # fit the Weibull competing risks model
print('----------------------------------------')
MM_fit = Fit_Weibull_Mixture(failures=data)  # fit the Weibull mixture model
plt.legend()
plt.show()
print('----------------------------------------')

# create a dataframe to display the goodness of fit criterion as a table
goodness_of_fit = {'Model': ['Competing Risks', 'Mixture'], 'AICc': [CR_fit.AICc, MM_fit.AICc], 'BIC': [CR_fit.BIC, MM_fit.BIC], 'AD': [CR_fit.AD, MM_fit.AD]}
df = pd.DataFrame(goodness_of_fit, columns=['Model', 'AICc', 'BIC', 'AD'])
print(df)

'''
Results from Fit_Weibull_CR (95% CI):
Analysis method: MLE
Failures / Right censored: 50/0 (0% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
  Alpha 1         229.868         51.2178   148.531   355.744
   Beta 1         2.50124        0.747103   1.39286   4.49162
  Alpha 2         199.717         8.56554   183.615   217.231
   Beta 2         9.20155         2.20135   5.75734   14.7062

Goodness of fit    Value
 Log-likelihood -255.444
           AICc  519.777
            BIC  526.536
             AD 0.582534

----------------------------------------
Results from Fit_Weibull_Mixture (95% CI):
Analysis method: MLE
Failures / Right censored: 50/0 (0% right censored)

    Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
      Alpha 1          100.43         12.4539   78.7606   128.061
       Beta 1         4.07758         1.21231   2.27684   7.30253
      Alpha 2         189.763         5.13941   179.952   200.108
       Beta 2         7.70218         1.35191    5.4602   10.8647
 Proportion 1        0.215596       0.0815977 0.0964594  0.414393

Goodness of fit    Value
 Log-likelihood -254.471
           AICc  520.306
            BIC  528.503
             AD 0.529284

----------------------------------------
             Model    AICc     BIC       AD
0  Competing Risks 519.777 526.536 0.582534
1          Mixture 520.306 528.503 0.529284
'''
_images/CRvsMM_fitV4.png