DSZI models

What are DSZI models?

DSZI is an acronym for “Defective Subpopulation Zero Inflated”. It is a combination of the Defective Subpopulation (DS) model and the Zero Inflated (ZI) model.

A defective subpopulation model is where the CDF does not reach 1 during the period of observation. This is caused when a portion of the population fails (known as the defective subpopulation) but the remainder of the population does not fail (and is right censored) by the end of the observation period.

A zero inflated model is where the CDF starts above 0 at the start of the observation period. This is caused by many “dead-on-arrival” items from the population, represented by failure times of 0. This is not the same as left censored data since left censored is when the failures occurred between 0 and the observation time. In the zero inflated model, the observation time is considered to start at 0 so the failure times are 0.

In a DSZI model, the CDF (which normally goes from 0 to 1) goes from above 0 to below 1, as shown in the image below. In this image the scale of the PDF and CDF are normalized so they can both be viewed together. In reality the CDF is much larger than the PDF.

A DSZI model may be applied to any distribution (Weibull, Normal, Lognormal, etc.) using the transformations explained in the next section. The plot below shows how a Weibull distribution can become a DS_Weibull, ZI_Weibull and DSZI_Weibull. Note that the PDF of the DS, ZI, and DSZI models appears smaller than that of the original Weibull model since the area under the PDF is no longer 1. This is because the CDF does not range from 0 to 1.

Equations of DSZI models

A DSZI Model adds a minor modification to the PDF and CDF of any standard distribution (referred to here as the “base distribution”) to transform it into a DSZI Model. The transformations are as follows:

\(PDF_{DSZI} = PDF_{base} × (DS-ZI)\)

\(CDF_{DSZI} = CDF_{base} × (DS-ZI) + ZI\)

In the above equations the base distribution (represented by \(PDF_{base}\) and \(CDF_{base}\)) is transformed using the parameters DS and ZI. DS is the maximum of the CDF which represents the fraction of the total population that is defective (the defective subpopulation). ZI is the minimum of the CDF which represents the fraction of the total population that failed at t=0 or equivalently were “dead-on-arrival” (the zero inflated fraction). To create only a DS model we can set ZI as 0. To create only a ZI model we can set DS as 1. The parameters DS and ZI must be between 0 and 1, and DS must be greater than ZI. The above equations can be expanded depending on the equation of the base distribution. For example, if the base distribution is a two parameter Weibull distribution, the DSZI model would be:

\(\text{PDF:} \hspace{11mm} f(t) = \frac{\beta}{\alpha}\left(\frac{t}{\alpha}\right)^{(\beta-1)}{\rm e}^{-(\frac{t}{\alpha })^ \beta } \left(DS - ZI \right)\)

\(\text{CDF:} \hspace{10mm} F(t) = \left(1 - {\rm e}^{-(\frac{t}{\alpha })^ \beta }\right) \left(DS - ZI \right) + ZI\)

The SF, HF and CHF can be obtained using transformations from the CDF and PDF using the relationships between the five functions.

Creating a DSZI model

Within reliability, the DSZI Model is available within the Distributions module. The input requires the base distribution to be specified using a distribution object and the DS and ZI parameters to be specified if required. DS defaults to 1 and ZI defaults to 0. The output API matches the API for the standard distributions.

API Reference

For inputs and outputs see the API reference.

Example 1

In this first example, we will create a Gamma DSZI model and plot the 5 functions.

from reliability.Distributions import Gamma_Distribution, DSZI_Model
model = DSZI_Model(distribution = Gamma_Distribution(alpha=50,beta=2), DS= 0.8, ZI=0.3)
model.plot()

Example 2

In this second example, we will create a Lognormal_DS model, draw some random samples and plot those samples on the survival function plot.

from reliability.Distributions import Lognormal_Distribution, DSZI_Model
from reliability.Probability_plotting import plot_points
import matplotlib.pyplot as plt
model = DSZI_Model(distribution = Lognormal_Distribution(mu=2,sigma=0.5), DS= 0.75)
failures, right_censored = model.random_samples(50,seed=7, right_censored_time = 50)
model.SF()
plot_points(failures = failures, right_censored = right_censored, func="SF")
plt.show()

Note that in the above example, the random_samples function returns failures and right_censored values. This differs from all other Distributions which only return failures. The reason for returning failures and right_censored data is that is is essential to have right_censored data in order to have a DS Model.

Fitting a DSZI model

API Reference

For inputs and outputs see the API reference for Fit_Weibull_DS, Fit_Weibull_ZI, and Fit_Weibull_DSZI.

As we saw above, the DSZI_Model can be either DS, ZI, or DSZI depending on the values of the DS and ZI parameters. Within the Fitters module, three functions are offered, one of each of these cases with the Weibull_2P distribution as the base distribution. The three Fitters available are Fit_Weibull_DS, Fit_Weibull_ZI, and Fit_Weibull_DSZI. If your data contains zeros then only the Fit_Weibull_ZI and Fit_Weibull_DSZI fitters are appropriate. Using anything else will cause the zeros to be automatically removed and a warning to be printed. Fit_Weibull_ZI does not mandate that the failures contain zeros, but if failures does not contain zeros then ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P. Fit_Weibull_DS does not mandate that right_censored data is provided, but if right_censored data is not provided then DS will be 1 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P. Fit_Weibull_DSZI does not mandate that failures contain zeros or that right_censored data is provided. If right_censored data is not provided then DS will be 1. If failures does not contain zeros then ZI will be 0. If failures does not contain zeros and no right censored data is provided then DS will be 1, ZI will be 0 and the alpha and beta parameters will be equivalent to the results from Fit_Weibull_2P.

Example 3

In this example, we will create 70 samples of failure data from a Weibull Distribution, and append 30 zeros to it. We will then use Fit_Weibull_ZI to model the data.

from reliability.Distributions import Weibull_Distribution
from reliability.Fitters import Fit_Weibull_ZI
from reliability.Probability_plotting import plot_points
import numpy as np
import matplotlib.pyplot as plt

data = Weibull_Distribution(alpha=200, beta=5).random_samples(70, seed=1)
zeros = np.zeros(30)
failures = np.hstack([zeros, data])
plt.subplot(121)
fit = Fit_Weibull_ZI(failures=failures)
plt.subplot(122)
fit.distribution.CDF()
plot_points(failures=failures)
plt.tight_layout()
plt.show()

'''
Results from Fit_Weibull_ZI (95% CI):
Analysis method: Maximum Likelihood Estimation (MLE)
Optimizer: TNC
Failures / Right censored: 100/0 (0% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
    Alpha         192.931         5.33803   182.747   203.682
     Beta         4.53177        0.431272   3.76064   5.46102
       ZI             0.3       0.0458258  0.218403  0.396613

Goodness of fit    Value
 Log-likelihood -426.504
           AICc  859.259
            BIC  866.824
             AD  5.88831
'''

We can see above how the fitter correctly identified that the distribution was 30% zero inflated, and it did a reasonable job of finding the alpha and beta parameters of the base distribution.

Example 4

In this example, we will use Fit_Weibull_DS to model some data that is heavily right censored. The DS=0.4 parameter means that only 40% of the data is failure data, with the rest being right censored. The original distribution is overlayed in the plot for comparison of the goodness of fit.

from reliability.Distributions import DSZI_Model, Weibull_Distribution
from reliability.Fitters import Fit_Weibull_DS
import matplotlib.pyplot as plt
from reliability.Probability_plotting import plot_points

model = DSZI_Model(distribution=Weibull_Distribution(alpha=70, beta=2.5), DS=0.4)
failures, right_censored = model.random_samples(100, right_censored_time=120, seed=3)
model.CDF(label="true model", xmax=300)
fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored, show_probability_plot=False)
fit_DS.distribution.CDF(label="fitted Weibull_DS", xmax=300)
plot_points(failures=failures, right_censored=right_censored)
plt.legend()
plt.show()

'''
Results from Fit_Weibull_DS (95% CI):
Analysis method: Maximum Likelihood Estimation (MLE)
Optimizer: TNC
Failures / Right censored: 41/59 (59% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
    Alpha         67.9275         4.61424   59.4599   77.6009
     Beta         2.63207        0.357826    2.0164   3.43571
       DS        0.414739       0.0500682  0.321106  0.514964

Goodness of fit    Value
 Log-likelihood -254.236
           AICc  514.721
            BIC  522.287
             AD  374.746
'''

Example 5

In this example, we will use some real world data from a vehicle manufacturer, which is available in the Datasets module. This example shows how the Weibull_2P model can be an inappropriate choice for a dataset that is heavily right censored. In addition the the visual proof provided by the probability plot (left) and the CDF (right), we can see the goodness of fit criterion indicate that Weibull_DS was much better (closer to zero) than Weibull_2P.

from reliability.Fitters import Fit_Weibull_DS, Fit_Weibull_2P
import matplotlib.pyplot as plt
from reliability.Probability_plotting import plot_points
from reliability.Datasets import defective_sample

failures = defective_sample().failures
right_censored = defective_sample().right_censored

plt.subplot(121)
fit_DS = Fit_Weibull_DS(failures=failures, right_censored=right_censored)
print('-------------------------------------------')
fit_2P = Fit_Weibull_2P(failures=failures, right_censored=right_censored)

plt.subplot(122)
fit_DS.distribution.CDF(label="fitted Weibull_DS",xmax=1000)
fit_2P.distribution.CDF(label="fitted Weibull_2P",xmax=1000)
plot_points(failures=failures, right_censored=right_censored)
plt.ylim(0,0.25)
plt.legend()
plt.title('Cumulative Distribution Function')
plt.suptitle('Comparison of Weibull_2P with Weibull_DS')
plt.gcf().set_size_inches(12,6)
plt.tight_layout()
plt.show()

'''
Results from Fit_Weibull_DS (95% CI):
Analysis method: Maximum Likelihood Estimation (MLE)
Optimizer: TNC
Failures / Right censored: 1350/12295 (90.10627% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
    Alpha         170.983         4.61716   162.169   180.276
     Beta         1.30109       0.0297713   1.24403   1.36077
       DS         0.12482      0.00333709  0.118425  0.131509

Goodness of fit    Value
 Log-likelihood -11977.7
           AICc  23961.3
            BIC  23983.9
             AD  27212.4

-------------------------------------------
Results from Fit_Weibull_2P (95% CI):
Analysis method: Maximum Likelihood Estimation (MLE)
Optimizer: TNC
Failures / Right censored: 1350/12295 (90.10627% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
    Alpha         10001.5         883.952    8410.7   11893.1
     Beta        0.677348        0.016663  0.645463  0.710807

Goodness of fit    Value
 Log-likelihood -12273.2
           AICc  24550.3
            BIC  24565.4
             AD    27213
'''

Example 6

In this example we will create a DSZI model with DS=0.7 and ZI=0.2. Based on these parameters, we expect the random samples to be around 70% failures and of those failures 20% of the total samples (failures + right censored) should be zeros due to the zero inflated fraction. We draw the random samples from the model and then fit a Weibull_DSZI model to the data. The result is surprisingly accurate showing DS=0.700005 and ZI=0.22, with the alpha and beta parameters closely resembling the parameters of the input Weibull Distribution. The plot below shows the CDF on the Weibull probability plot (left) and on linear axes (right) which each provide a different perspective of how the distribution models the failure points.

from reliability.Distributions import DSZI_Model, Weibull_Distribution
from reliability.Probability_plotting import plot_points
import matplotlib.pyplot as plt
from reliability.Fitters import Fit_Weibull_DSZI

model = DSZI_Model(distribution=Weibull_Distribution(alpha=1200,beta=3),DS=0.7,ZI=0.2)
failures, right_censored = model.random_samples(100,seed=5,right_censored_time=3000)

plt.subplot(121)
fit = Fit_Weibull_DSZI(failures=failures,right_censored=right_censored,label='fitted Weibull_DSZI')
model.CDF(label='true model')
plt.legend()

plt.subplot(122)
fit.distribution.CDF(label='fitted Weibull_DSZI')
model.CDF(label='true model')
plot_points(failures=failures,right_censored=right_censored)
plt.legend()
plt.tight_layout()
plt.show()

'''
Results from Fit_Weibull_DSZI (95% CI):
Analysis method: Maximum Likelihood Estimation (MLE)
Optimizer: TNC
Failures / Right censored: 70/30 (30% right censored)

Parameter  Point Estimate  Standard Error  Lower CI  Upper CI
    Alpha         1170.12         68.0933   1043.99   1311.49
     Beta         2.60255        0.299069   2.07771   3.25997
       DS        0.700005        0.045826  0.603391  0.781602
       ZI            0.22       0.0414247  0.149465  0.311627

Goodness of fit    Value
 Log-likelihood -463.613
           AICc  935.647
            BIC  945.646
             AD  166.025
'''

The DSZI model is a model of my own making. It combines the well established DS and ZI models together for the first time to enable heavily right censored data to be modelled using a DS distribution while also allowing for zero inflation of the failures.