## INTRODUCTION

Preweaning mortality of piglets has become an important issue because of economic and animal welfare concerns (Grandison et al., 2002). Recent studies used different approaches and methodologies to analyze preweaning survival. One possible option is to consider survivability as a binary outcome (e.g., 0 = alive at time *t*; 1 = dead at time *t*), and analyze it using a threshold model (**TM**; Gianola, 1982; Gianola and Foulley, 1983). Most investigations have traditionally used linear models to analyze piglet mortality (or piglet survival) assuming the continuous distribution of the trait, but ignoring its categorical nature and censoring (Van Arendonk et al., 1996; Knol et al., 2002). Another possible approach is the analysis of failure time during the period of interest (i.e., time from birth to death) with regression models (Cox, 1972; Prentice and Gloeckler, 1978). The advantage of using a time-to-event model is that there is no need to restrict the observations to an arbitrarily defined point as in the TM (Ducrocq, 1997), and the model allows handling censoring in a more appropriate manner. A third possible approach is using a sequential threshold model (**STM**; Albert and Chib, 2001), which can analyze categorical traits that occur in a sequential order and accommodate time-dependent covariates. This model was used recently to compare predictive ability between STM, TM, and survival analysis for number of inseminations to conception in Holstein cows (González-Recio et al., 2005).

The objectives of this study were to infer parameters of piglet preweaning survival using survival analysis, TM, and STM in a crossbred pig population and to compare models in terms of predictive abilities and goodness of fit.

## MATERIALS AND METHODS

Animal Care and Use Committee approval was not obtained for this study because the data were from an existing database. Records were registered in the sib testing program of the C21 Large White boar line (Gorzagri, Fonzaso, Italy) from 2000 to 2006.

### Animals and Data

The sib testing program and the breeding goal of the C21 Large White boar line are fully described in Cecchinato et al. (2008). Briefly, data on survival of piglets at birth up to weaning have routinely been collected in the sib testing program since 2000 and included birth litter description (sow ID, parity, sire ID, date of farrowing, and litter size at birth), individual piglet information (i.e., ID, sex, status at birth: stillborn or alive), and weaning or death (if the piglet died during the suckling period) date, as well as the date of transfer and the foster dam ID for cross-fostered piglets. Cross-fostering occurred for 46% of the litters and was of similar proportion for male and female piglets. For piglets dying before weaning, survival time was computed as the difference between the dates of death and birth. For piglets still alive at weaning, time to censoring was computed as the difference between the dates of weaning and birth.

After editing procedures, which aimed to discard records with incomplete or inconsistent information (120 piglets) and with unknown sire (105 piglets), data from 13,924 piglets (1,347 litters) sired by 189 C21 boars and mated to 328 Large White derived crossbred sows were available for the study.

Piglet survival time was analyzed using 2 survival analysis models: Cox and Weibull time-dependent model (**WTD**), and 2 threshold models: TM and STM. These methods are described next.

## Survival Analysis

The survivor function for the general population was estimated using the Kaplan-Meier method (Kaplan and Meier, 1958). To test whether a Weibull distribution properly fit the data, the log of the Kaplan-Meier estimate (nonparametric) of the survivor curves was plotted against the log of time. If the assumption for Weibull holds, a straight line should be obtained. Because the relationship was not linear (not shown), the assumption was rejected and alternative models were investigated.

### Cox Model.

The distribution function for the baseline is left completely unspecified (Cox, 1972), and the model is a semi-parametric proportional hazard model. The hazard function can be expressed aswhere = hazard function (instantaneous probability of death) for a given piglet at time *t*; *t* = time (measured in days) from birth until death or weaning (censoring); *h*_{0}(*t*) = baseline hazard function; = time-independent fixed effect of cross-fostering (*i*, 0 = no; 1 = yes); = time-independent fixed effect of sex (*j*, 0 = male; 1 = female); = time-independent fixed effect of year-month of birth (*k* = 72 monthly classes from July 2000 to July 2006); = time-independent fixed effect of class of standardized total merit index of the sire of the piglet (*l* = class 1: *TMI* < −1 SD; class 2: −1 SD ≤ *TMI* ≤ 1 SD; class 3: *TMI* > 1 SD); = time-dependent fixed effect of parity of the nurse sow that could change value in the time space as a consequence of cross-fostering (*m* = 1 to 7 or more); = time-dependent fixed effect of nursed litter size that could change value in the time space as a consequence of cross-fostering, carried out to homogenize size of litters, and of piglet mortality (*n* = class 1: ≤5 piglets; class 2: from 6 to 8 piglets; class 3: from 9 to 11 piglets; class 4: from 12 to 14 piglets; class 5: ≥15 piglets); = time-dependent random effect of the nursed litter that might change as a consequence of cross-fostering (*o* = 1 to 1,347), assumed to be independently distributed as a log-gamma with shape parameter γ from which the variance of the nursed litter effect can be derived; = time-independent random effect of sire effects (*p* = 1 to 189) assumed to follow a multivariate normal distribution with mean 0 and variance where **A** is the additive genetic relationship matrix among sires and is the variance component for sire.

The sire variance and the γ parameter of the log-gamma distribution were estimated using the Bayesian approach described in Ducrocq and Casella (1996). Prior distributions (multivariate normal and log-gamma distributions for the sire and the nursed litter effects) were combined with the likelihood function of the data to obtain an expression proportional to the joint posterior density of all parameters. The likelihood function was replaced by a partial likelihood (Cox, 1972), which does not contain any information about the arbitrary baseline hazard function. Further technical details of the random parameter estimation under survival analysis are given in Ducrocq and Casella (1996).

### WTD.

A parametric distribution was assumed as baseline function after the inclusion of a time-dependent covariate (Tarrés et al., 2005). The inclusion of time-dependent covariates *P _{c}* converts the Weibull hazard function into a Weibull time-dependent hazard function:

This model assumes that the hazard for an individual is constant within each period of time but is different for the same individual across the *c* = 1, 2, …, *k* + 1 different periods, corresponding to *k* cut points. Cut points were identified, following the same approach used by Tarrés et al. (2005), with a spline regression of the log of the Kaplan-Meier survival function on time. To cope with this approach, all possible combinations of 1, 2, …, *k* cut points in the time space (preweaning period) were explored, avoiding combinations of adjacent days. In this study a time-dependent effect changing at d 6 and 12 was found. Details on the procedure are reported in Tarrés et al. (2005). Heritabilities on the effective and equivalent scale were determined according to Yazdi et al. (2002). All survival analyses were carried out using the Survival Kit software (Ducrocq and Solkner, 1994).

### TM.

A TM (Wright, 1934; Gianola, 1982; Gianola and Foulley, 1983) was used for the analysis of preweaning mortality as a binary outcome. The TM postulates an underlying continuous random variable, called liability, *i* = 1,…, *n*, such that the observed binary response is the result of the following relationship:where is a fixed threshold, and = 1 or = 0 corresponds to piglet *i* dead before weaning or not. The liability, is assumed to be normally distributed with mean vector **µ** and covariance matrix where is the residual variance in the underlying scale. Because the threshold and are not identifiable, these parameters were set to some arbitrary values = 0 and = 1) to denote origin and scale of measurement, respectively. Hence, it was assumed that the vector of liabilities, given **µ**, followed the distribution Then the probability that observation *i* is scored as 1, given a model parameter vector, is defined aswhere is the standard normal cumulative distribution function. The statistical model for liability waswhere is the value of the liability for a piglet sired by sire *p*, suckled into litter *o* of size *n* from a sow of parity *m,* with sire total merit index class *l*, born in the year-month class *k*, of sex *j*, and belonging to class of cross-fostering *i*. Some justification of the model is required. Effects were defined as in the survival model with the exceptions of the nurse sow parity nursed litter size and the permanent environmental effect determined by the nursed litter because TM methodology is not able to handle time-dependent covariates. In this case, piglets were attributed to size of the nursed litter, parity of the nurse sow, and nursed litter, identifying the most prolonged condition for the piglet. This raised a question about modeling simultaneously both maternal and permanent environmental effects. For a piglet that was cross-fostered to a different litter, accounting for both the biological and the nurse sow effects was not feasible because these effects were confounded. In the present study, the choice was to model the permanent environmental effect determined by the nursed litter.

A Bayesian implementation via the Gibbs sampler (Sorensen et al., 1995) for a multi-tier or hierarchical structure was adopted. Improper priors were used for systematic and dispersion parameters. Sire genetic effects were assumed to be distributed a priori following a *MVN*(**0**, **A** where **A** is the additive genetic relationship matrix among sires and is a variance component for sire effects. Parameters were drawn from the posterior conditional distributions using Gibbs sampling, as implemented in the TM software (http://snp.toulouse.inra.fr/~alegarra/). Based on the visual inspection of the trace plots, a chain of 200,000 iterations of length was used, with a burn-in of 20,000 rounds and a thin interval of 10 samples.

### STM.

Survivability of piglets to weaning can be expressed in a sequential order and analyzed using an STM (Albert and Chib, 2001). There are 3 important stages in the weaning process (birth to 3 d, 3 to 6 d, and 6 d up to weaning), which indicate that an observation present at a given stage of the sequence must have passed through all previous stages. In this study, a piglet alive at weaning survived to all previous stages. Therefore, a latent variable can be used to represent the propensity of the piglet to survive to the next stage. The response variable, *y _{i}*, can take the value

*j*only after stages 1, …,

*j*− 1 are reached, and then, either survive or failure (death) in stage

*j*is observed. Hence, the probability to survive at stage

*j*, conditionally on the event that the piglet survived to stage (

*j*− 1)th, is given bywhere now

**β**represents the systematic effects described in survival and threshold models. As before,

**q**and

**s**represent the effects of nursed litter and sire additive genetic effects, respectively. Further, the vector represents unordered cut points that do not need to be ordered as in the case of an ordinal threshold model (Albert and Chib, 2001).

This model can also be formulated in terms of latent variables expressing the propensity of a piglet to survive to the next stage. It is possible to define latent variables corresponding to each of the *j* stages, where The residuals were assumed distributed as We observe and we observe if the first latent variable and the second latent variable In general,

These latent variables can be incorporated into a Markov chain Monte Carlo sampling scheme. The latent variable representation can be simplified by incorporating the cut points into the mean function and fixing 1 of the cut points, usually Each latent variable can have different explanatory variables (González-Recio et al., 2005), therefore accommodating time-dependent effects (including the vector of cut points), and a binary-threshold model can be fitted for the survivability at each stage event. The response variable is failure (dead at the present stage) = 0, or survivor (passed to the next stage) = 1. Further details on this model may be found in Albert and Chib (2001) and González-Recio et al. (2005) in an animal breeding context.

Effects, other than the cut points, were as in previous models with the difference that the residual variance, which was fixed to 1, was assumed to be constant at each step of the sequence.

Posterior distributions of the parameters were estimated using a Gibbs sampling algorithm for STM (Sorensen et al., 1995; Sorensen and Gianola, 2002), drawing samples from a single chain of 200,000 iterations, with the first 10,000 samples discarded and a thin interval of 10 samples.

## Comparison Among Models

### Predictive Ability.

Prediction of future observations is a concern that might be answered using the predictive ability of models, a notation that arises naturally in Bayesian statistics (Matos et al., 1997). A suitable procedure to check for better predictive ability across different models is cross-validation (Shao, 1993). This method usually involves omitting a portion of the available data, fitting a predicting model to the remaining data, and then testing the model fit on the omitted portion. The data set was randomly partitioned into 4 data sets to perform a 4-fold cross-validation, with the restriction that all levels of fixed effects were represented in each partition, following Caraviello et al. (2004) and González Recio et al. (2005). In addition, the number of records per sire or litter was as evenly distributed among the 4 partitions as possible. A 4-fold cross-validation suited to the situation of this study with a relatively large data set was carried out. This cross-validation is not as computationally demanding as a leave-one-out could be and makes a complete use of the data (vs. only having 1 training and 1 testing subset). The predicted transmitting ability (**PTA**) for sires in each training set was calculated by the 4 models. Survivability observations (i.e., binary indicators of survival to 3, 6, or 28 d of age among piglets that had the opportunity to survive that long) for piglets in each training set were regressed (using logistic regression) on PTA obtained from each method, such that the PTA of each sire could be converted into his piglets probability of survival to 3, 6, and 28 d. Next, the expected number of piglets of each sire that would survive to 3, 6, or 28 d of age (among those that had an opportunity to survive that long) in the testing set was calculated by multiplying the probability of survival to a given age estimated from the training set by the total number of piglets per sire in the independent data set. The actual number of piglets in the independent data set that survived to 3, 6, or 28 d of life was subsequently determined for each sire, and the observed and expected numbers of survivors and failures were compared using the following χ^{2} statistic:

These χ^{2} statistics were summed across sire, and the model that produced the smallest sum was considered as the most accurate predictor of survivability. In addition, Spearman’s and Pearson’s correlations between EBV obtained with different models were calculated.

### Goodness of Fit.

Goodness of fit was assessed using the local weighted regression and the mean square error (**MSE**). After obtaining PTA of all sires, these were matched with the observed mortality rates in the original data set. Local weighted regression is a nonparametric approach for fitting curves and surfaces to databased on smoothing (Cleveland and Loader, 1996). This method was used to approximate the relationship between mortality rate (response variable) and PTA estimates (explanatory variables) locally by a smooth curve based on a nonparametric function, using locally weighted least squares. Weights are assigned such that points close (in the Euclidean distance) to the predictor value of interest receive a greater weight. The MSE for each method was calculated as the average of the squares of the differences between the actual average and the local weighted regression estimate at each point.

## RESULTS AND DISCUSSION

### Descriptive Statistics

A description of the data used in this study is summarized in Table 1. The data set included 4.90, 4.55, and 4.65% uncensored records (dead piglets) for stage 1 (from birth to d 3), 2 (from d 3 to 6), and 3 (from d 6 up to weaning), respectively. In total, 86% of records were censored (piglets alive at weaning) and censoring occurred at an average time of 28 d (weaning). Piglet mortality from birth to weaning was 14% and average failure time for uncensored records (dead piglets) was 6 d. A detailed description of the influence of nongenetic effects involved in the survival process is reported in Cecchinato et al. (2008).

### Variance Components and Heritability

Parameters of the approximated marginal posterior distributions of sire and nursed litter variance components and estimates of effective and equivalent heritability for Cox and WTD models are reported in Table 2. A discussion on the estimated variance components and heritabilities obtained with the 2 survival models is extensively reported in Cecchinato et al. (2008).

Summaries of the posterior distributions of the parameters for TM and STM are presented in Table 3. The estimated sire variances from both models were very similar (0.0116 and 0.0122). Regarding the nursed litter variance, the posterior mean from STM was slightly larger than that from TM (0.1667 vs. 0.1472). Besides, smaller posterior SD were obtained for both variances. Note that the STM accounts for variation in litter membership due to cross-fostering, whereas TM ignores it. The estimated nursed litter variance was much larger than the sire variance estimate in both models, confirming the nurse sow common environment as a key factor affecting the survival of piglets before weaning (Casellas et al., 2004; Cecchinato et al., 2007; Wolf et al., 2007). Estimates of heritability for preweaning mortality were small, which is in agreement with results of Kerr and Cameron (1995) and Roehe and Kalm (2000).

However, when comparing heritabilities, one must consider that the traits evaluated by the TM and survival analysis are different. Van Arendonk et al. (1996) and Knol et al. (2002) used linear models ignoring the categorical nature of the trait. Few studies addressed estimation of variance components for piglet mortality using TM (Grandison et al., 2002; Arango et al., 2005; Cecchinato et al., 2010) and, to our knowledge, only one modeled the trait at the individual piglet level (Grandison et al., 2002).

Previous studies have reported low estimates of heritability for piglet mortality or survival rate, with an average of 0.05 (at the level of the litter and as a trait of the sow), as reviewed by Rothschild and Bidanel (1998). There is large variation across estimates obtained in different studies. Lamberson and Johnson (1984) reported an estimate of heritability for preweaning survival of 0.03, whereas Ferguson et al. (1985) reported values of 0.14 and 0.18 in Yorkshire and Duroc, respectively.

### Predictive Ability of Models

Spearman’s (r_{s}) and Pearson’s (r_{p}) correlations between estimates of sire effects obtained with different models are presented in Table 4. Correlation ranged from 0.82 to 0.98 (in absolute values). The WTD had greater correlation with the Cox and the STM models (r_{s} = 0.98, r_{p} = 0.96 and r_{s} = −0.96, r_{p} = −0.97, respectively) than with TM. The correlations between STM with the other 3 models were negative because, for STM, the trait stands for survivability instead of risk of death (i.e., increased PTA estimate for survival models and TM indicates a larger probability of progeny mortality, whereas increased PTA for STM means larger probability to survive to next stage). Application of investigated methodologies resulted in similar sire rankings. The scientific literature shows, in general, a decreased rank correlation for sire PTA obtained from the analyses of continuous or categorical traits, indicating moderate re-rankings, in agreement with the present study. Boettcher et al. (1999) found PTA ranking correlations equal to 0.90 between threshold (binary trait) and survival models (continuous trait) for longevity in dairy cattle. Carlén et al. (2005) found sire PTA correlations of 0.93, 0.89, and 0.88 (for lactation 1 to 3) for mastitis incidence between survival analysis (time to first mastitis) and the binary trait (appearance of mastitis) analyzed using a mixed linear model. A strong correlation between WTD and Cox as well as between WTD and STM was expected because of the ability of these methods to handle time-dependent covariates. Similar results were reported by González-Recio et al. (2005) who investigated the number of services to conception in dairy cattle, with a discrete proportional hazard model and STM, and reported a correlation of −0.98 between sire rankings provided by the 2 models.

Results from the predictive cross-validation procedure for each of the methods are presented in Table 5. The STM provided more accurate predictions for piglet preweaning survival than other models tested. It resulted in the smallest sum of χ^{2} statistics in 3 out of 4 testing sets for stage 2 (from d 3 to 6) and birth to weaning period, outperforming WTD, Cox, and TM. Model STM exhibited also the best predictive ability for stage 1 (from birth to d 3) for 2 out of 4 testing sets, whereas WTD and Cox model were better in the other 2 testing sets. The Cox model seemed to provide better predictions for the third stage, where fewer piglets died in comparison with previous stages. In terms of sum of χ^{2} statistics across sire, difference between Cox and WTD was of similar magnitude in all stages. Casellas et al. (2006) reported that a Weibull baseline hazard function with the inclusion of time-dependent effects might approximate a smooth function similar to the Cox model. A time-dependent effect (in our case changing at d 6 and 12) may be difficult to interpret as an independent effect in the model, but it has to be understood as a new component of the survival baseline, adjusting the aging process of the whole population (Casellas, 2007). The TM showed the worst predictive ability among methods used and did not perform better than any other methods in any of the tests. This was expected because there is a loss of information when collapsing into a binary outcome. The cross-validation was performed over the most important preweaning stages from a productive point of view (0 to 3, 3 to 6, and 6 to 28 d), which were also used to check survivability with the STM. This might favor the STM; however, the STM also performed better in a global stage (0 to 28 d). The flexibility of STM to discern between productive stages is an important feature of this model. Besides, a conceptual appeal of STM is that parameters are easier to interpret; for example, heritability pertains to a linear, Gaussian scale of liability. This means that all the machinery of the infinitesimal model holds on the liability scale. This is not the case with the Cox and WTD models. Significance of these estimates is difficult to obtain. Bootstrapping techniques may help to assess significance, but they are much too time-consuming to be applied in large data set and Markov chain Monte Carlo procedures such those in this study.

### Goodness of Fit

Figure 1 shows the nonparametric fit relating progeny raw mortality to the sire PTA for each of the 4 models. There was an association between PTA and progeny mortality, and all the models seemed to have similar fitness. The Cox model performed better than others. A smaller bandwidth parameter was determined for this model, which led to a slight overfitting, as shown in the regression line in Figure 1. After trial-error learning, this parameter value was chosen because it provided the smallest MSE among those tested. The MSE for Cox model was 13% less than MSE for WTD (75.88 × 10^{−4} vs. 85.55 × 10^{−4}), indicating a greater flexibility of nonparametric compared with parametric approaches to fit survival data, although Cox implies a greater computational demand. Differences in computational time required between WTD and Cox models are substantial. The MSE from STM and TM were similar (85.51 × 10^{−4} vs. 83.62 × 10^{−4}). These are the first available results comparing goodness of fit of TM and STM of piglet preweaning survival.

In general, the Cox model showed better fit between progeny raw mortality and sire PTA than other models. Discrepancies between WTD and STM were minimal, less than 0.1% (MSE = 85.55 × 10^{−4} vs. 85.51 × 10^{−4}; Figure 1).

### Implications

Four methods (survival analyses and threshold models) to analyze preweaning survival in pigs were tested, leading to generally similar results. All 4 models led to similar ranking for sires, with strong correlation between methods. The STM had a better performance for predicting piglet survival but had a poorer performance in terms of goodness of fit than Cox. The sampling distribution of the χ^{2} statistic, used to test predictive ability, is unknown, and therefore, a very computationally demanding method, such as permutation test or bootstrapping, would be needed to assess their significance. However, consistency of better predictive ability throughout cross-validation folds was obtained for the STM, mainly in important productive stages such as survivability from birth to d 6 of life, where greater mortality occurs, and also in the global weaning period (birth to d 28).

The results in this study suggest that STM may, globally, be better than other methods tested, both for its better predictive ability of piglet survival in genetic evaluations and for its easier interpretation. Further, STM is computationally less demanding and can be extended to allow for different variance components by different period from birth to weaning.