1st Page

Journal of Animal Science Abstract - Special Topics

Handling of missing data to improve the mining of large feed databases1


This article in JAS

  1. Vol. 91 No. 1, p. 491-500
    Received: May 22, 2012
    Accepted: Aug 28, 2012
    Published: December 3, 2014

    2 Corresponding author(s):

  1. F. Maroto-Molina 2,
  2. A. Gómez-Cabrera,
  3. J. E. Guerrero-Ginel,
  4. A. Garrido-Varo,
  5. D. Sauvant,
  6. G. Tran§,
  7. V. Heuzé§ and
  8. D. C. Pérez-Marín
  1. Feed Information Service (SIA), University of Cordoba, 14014 Cordoba, Spain
    Department of Animal Production, School of Agricultural and Forestry Engineering (ETSIAM), University of Cordoba, 14014 Cordoba, Spain
    UMR 791 Physiology of Nutrition and Feeding, AgroParisTech, 75231 Paris, France
    French Association of Animal Production (AFZ), AgroParisTech, 75231 Paris, France


Feed databases often have missing data. Despite their potentially major effect on data analysis (e.g., as a source of biased results and loss of statistical power), database managers and nutrition researchers have paid little attention to missing data. This study evaluated various methods of handling missing data using mining outputs from a database containing data on chemical composition and nutritive value for 18,864 alfalfa samples. A complete reference dataset was obtained comprising the 2,303 cases with no missing data for the attributes CP, crude fiber (CF), NDF, ADF and ADL. This dataset was used to simulate 2 types of missing data (at random and not at random), each with 2 loss intensities (33 and 66%), thus yielding a total of 4 incomplete datasets. Missing data from these datasets were handled using 2 deletion methods and 4 imputation methods, and outputs in terms of the identification and typing of alfalfa (using ANOVA and descriptive statistics) and of correlations between attributes (using regressions) were compared with outputs from the complete dataset. Imputation methods, particularly model-based versions, were found to perform better than deletion methods in terms of maximizing information use and minimizing bias although the extent of differences between methods depended on the type of missing data. The best approximation to the uncertainty value was provided by multiple imputation methods. It was concluded that the choice of the most suitable method for handling missing data depended both on the type of missing data and on the purpose of data analysis.

  Please view the pdf by using the Full Text (PDF) link under 'View' to the left.

Copyright © 2013. American Society of Animal Science