Comparison of Models for some Testicular Characteristics in Karakaş Male Lambs

Kadir Karakuş1, Turgut Aygün2, Şenol Çelik3, Mohammad Masood Tariq4*, Muhammad Ali4 Majed Rafeeq4 and Farhat Abbas Bukhari4 1Department of Animal Science, Faculty of Agriculture, Malatya Turgut Özal University, Malatya, Turkey 2Department of Animal Science, Faculty of Agriculture, Van Yüzüncü Yıl University, Van-Turkey 3Biometry Genetics Unit, Department of Animal Science, Faculty of Agriculture, Bingöl University, Bingöl-Turkey 4Center for Advanced Studies in Vaccinology and Biotechnology, University of Balochistan, Quetta, Balochistan, Pakistan Article Information Received 12 February 2019 Revised 22 May 2019 Accepted 10 June 2019 Available online 06 December 2019


INTRODUCTION
T estis characteristics (testis diameter, testis length, scrotum circumference and scrotum length) were used as indirect selection criteria in breeding studies with increasing fertility (Yılmaz and Aygün, 2002). The testis characteristics were highly correlated with the scrotal environment and spermatological features. It can be measured in the early stages of growth, high heritability, the number of spermatogenesis and the number of eggs obtained from females. It is important in genetic breeding studies to be done (Bilgin et al., 2004;Yanli et al., 2017).
Significant correlations have been reported between testis characteristics (Salhab et al., 2001). Species affecting the development of testis, growing systems, season, age, body weight, hormones, etc. there are factors (Aygun et al., 1999;Gundogan et al., 2002;Karakus et al., 2016). There are also differences in testicular characteristics and the amount of hormones associated with seasonal changes (Milczewski et al., 2015). Among these factors, age and body weight have been reported to have significant effects on the scrotal circumference (Yılmaz and Aygün, 2002). There are studies conducted in different races in order to determine the relationships with sexual activity, growth and testis characteristics in the early periods (Belibasaki and Kouimtzis, 2000). There are studies comparing growth models to define the development of the scrotal circumference in sheep and cattle (Bilgin et al., 2004;Quirino et al., 1999;Loaiza-Echeverri et al., 2013;Santana et al., 2015). Different statistical analysis methods were used in the studies (Karakuş et al., 2010). In order to determine the relationship between live weight, age and testis characteristics, there are many studies in different breed lambs (Sarı et al., 2013;Celik et al., 2017). However, although there are few reported studies on the use of non-linear models to describe the growth of the scrotal environment in Karakaş male lambs, there are almost no studies with the recently developed models.
The aim of this study is to investigate the effect of birth weight, birth type, age of dam, age in control, testicular O n l i n e F i r s t A r t i c l e diameter, testicular length, scrotum circumference and scrotum length on the live weight in Karakaş lambs by the algorithms of MARS, CHAID, Exhaustive CHAID and CART.

Material
In this study, it were used Karakaş male lambs (n= 39) as animal material raised in intensive conditions at Research and Practice Farm of Van Yüzüncü Yıl University. Effects of genotype and some environment factors such as birth type, dam age, lamb' age in control and live weight on testis characteristics were also researched.

Methods
In male lambs, it were determined testis diameter, testis length, scrotum circumference, and scrotum length as testis characteristics. Chi-squared automatic interaction detection (CHAID) algorithm, originally proposed by Kass (1980) and further developed by Magidson (1993), is a well-known and widely used decision tree (DT) algorithm which constructs a tree using a recursive partitioning method.
In CHAID trees, the homogeneity of the groups generated by the tree is evaluated by a Bonferroni corrected p-value obtained from the chi square statistic applied to two-way classification tables with C classes and K splits for each tree node (Maroco et al., 2011): Where n ck refers to the actual frequencies of cell c k and ñ ck k is the expected frequencies under the null hypothesis of two-way homogeneity.
The Exhaustive CHAID (Biggs et al., 1991) has the same splitting and stopping steps as CHAID but the merging step is more exhaustive than CHAID, by continuing to merge categories of the predictor variable until only two super categories are left.
CART (Classification and Regression Tree) is a method of machine learning that was proposed by Breiman et al. (1984). Suppose that x 1 , x 2 , …, x n are the input variables and that y is the output variable for a training dataset in the space D with n input variables and m input samples. Let D= {(x 11 ,x 12 ,…..x 1n , y 1 ), ={(x 21 ,x 22 ,…..x 2n , y 2 ), ={(x m1 ,x m2 ,…..x mn , y mn )} CART splits D into a certain number of subspaces using a binary recursive process. Every subspace has an estimated value y determined by fitting using the least squares method; the optimal splitting variable j and splitting point s are finally selected to ensure that the binary division has the minimum residual variance as follows: Each child node is treated as a potential parent node in the next division process until the homogeneous divisions, or terminal nodes, are obtained (Herold et al., 2003).
Multivariate Adaptive Regression Splines (MARS) is a form of multivariable nonparametric regression analysis introduced by Friedman (1991). The basic opinion is to add up sections of spline's basis function (BF) to form a flexible MARS prediction model, to determine the value of the function of the basic equations by referring to the crossvalidation among the parameters, and to assess its loss of fit by the judging criteria in order to get the best and the most suitable variables set, knots, and the interaction to solve various high-dimensional data problems (Friedman, 1991).
The above formula is a common MARS model, in which BF is the multiple regressed section, which changes mainly based on demand.
Here, a 0 and a m are the parameter values. M is BF's quantity determined by the judgment criteria; Km is the knot quantity; the value of S km is +1 or-1 and its function is to show the direction; v(k,m) is the variable label; t km is the cut-off point (Steinberg et al., 1999). The optimal MARS model is selected in a two-stage process. Firstly, MARS constructs a very large number of basis functions to over-fit the data initially, where variables are allowed to enter as continuous, categorical, or ordinal are defined, and they can interact with each other or be restricted to enter in only as additive components. In the second stage, basis functions are deleted in the order of least contributions using the generalized cross validation (GCV) criterion (Friedman, 1991). GCV is given by (Kornacki and Ćwik, 2005).

with,
C=1+cd where N is the number of cases in the data set, d is the effective degrees of freedom, which is equal to the number of independent basis functions. The quantity c is the penalty for adding a basis function. Experiments have shown that the best value for C can be found somewhere in the range 2 < d < 3 (Hastie et al., 2001).
In order to comparatively test the predictive performance of data mining algorithms, the following goodness of fit criteria were calculated (Willmott and

RESULTS
Descriptive statistics values of the body sizes of Karakaş male lamb are given in Table I. The goodness of fit statistics developed to determine the appropriate methods in order to create the live weight prediction model are given in Table II. When Table II is analyzed, MARS method's r, R 2 and Adj. R 2 values are the highest, and SD ratio, RMSE, MAPE and AIC values are the lowest among the other goodness of fit criteria. According to these criteria, the best method is the MARS method. MARS model is obtained for the smallest GCV, 4th order interaction for Karakaş male lambs. The GCV value of this MARS model was 8.6. Model goodness of fit criteria (0.930 R 2 , 0.919 Adj. R 2 , 0.265 SD ratio, 2.177 RMSE, 5.796 MAPE and 416 AIC) displayed the highest predictive accuracy of the model structured based on MARS algorithm. Besides, correlation coefficient of 0.964 indicated a perfect agreement between the observed and the fitted LW scores for MARS predictive modeling (t =54.767, df = 227, p-value < 2.2e-16). The best models for estimating live weight of Karakaş male lambs are respectively MARS > CART > CHAID > Exhaustive CHAID. In the modeling of MARS prediction model, the variable selection results and obtained basis functions can be showed in Table III. It is observed that the used 30 forecasting variables do play of great importance roles in deciding the MARS forecasting models. Outside, the acquired basis functions and the MARS prediction function can provide important implications above the prediction variables.
The LW expressions in terms of the 30 basic functions for the MARS model are presented in below. The importance of the independent variables is given for predicting LW in Table IV. Note: This equality is the MARS model equation. An equation was obtained from the independent variables and interactions used to construct the model.
The relative importance of each variable is ordered from large to small in the model. The variables with the greatest relative significance is TESLENG (Table IV).
The codes of the package "earth" of R software for statistical analysis of MARS algorithm can show in Appendix.

O n l i n e F i r s t A r t i c l e
Comparison of Models for some Testicular Characteristics 5 The regression tree diagram of the CHAID, Exhaustive CHAID and CART algorithms are presented in Figures 1, 2 and 3, respectively.

DISCUSSION
When similar studies are investigated to determine the live weight estimation in animal husbandry, differences may occur due to independent variables (number of animals, species, race, age, type of birth, age of dam and sex), used algorithms and some restrictions (minimum number of nodes, depth of tree structure) on these algorithms for estimation performance. Eyduran (2016) expressed that if there is a multicollinearity problem in multiple regressions in sheep weight estimation, CART, CHAID, Exhaustive Table III  CHAID, MLP and MARS would be an alternative to multiple regression analysis. Aksoy et al. (2018) reported that MARS results produce higher estimation performance than multiple regression results. In another study with live weight estimation (Yakubu, 2012), in the CART algorithm applied in Uda rams, the R 2 value was found to be 0.62, which is different from our study. Khan et al. (2014) identified R2 as 0.84 in the CHAID algorithm for Harnai sheep. In another study, Ali et al. (2015) found the R2 in the CART, CHAID and Exhaustive CHAID algorithms for the  Eyduran et al. (2017b), for Mengali rams found that R 2 in the MARS algorithm, is 0.88. Balta and Topal (2018), in the study of CART algorithm for Hemşin lambs R 2 is 0.862. Olfaz et al. (2018), for the Karayaka lambs in CART and CHAID algorithms R 2 was found as 0.88, RMSE as 1.612 and 1.623 respectively. These results also support the findings in our study.

CONCLUSION
In the research on live weight of male lambs, the data mining methods such as CHAID, Exhaustive CHAID, CART and MARS algorithms have been found to be very useful. MARS model displayed the best forecasting capability. The knot values found for BW (4.2 kg), TDIA (4.8) and TESLENG (8.5) body measurements in the MARS be possible ensure anticipation for next studies to be carried out on lambs. Determination of MARS algorithm having very high predictive precision in the prediction of LW from body characteristics may put forward a new view point for lamb breeders.

Statement of conflict of interest
The authors declares there is no conflict of interest.