Search
Author
Title
Vol.
Issue
Year
1st Page

Journal of Animal Science - Animal Genetics

Justification for setting the individual animal genotype call rate threshold at eighty-five percent1

 

This article in JAS

  1. Vol. 94 No. 11, p. 4558-4569
     
    Received: July 08, 2016
    Accepted: Sept 08, 2016
    Published: October 27, 2016


    2 Corresponding author(s): donagh.berry@teagasc.ie
 View
 Download
 Share

doi:10.2527/jas.2016-0802
  1. D. C. Purfield*,
  2. M. McClure and
  3. D. P. Berry 2*
  1. * Animal & Grassland Research and Innovation Center, Teagasc, Moorepark, Fermoy, Co. Cork, Ireland
     Irish Cattle Breeding Federation, Bandon, Co. Cork, Ireland

Abstract

Data quality of SNP arrays impacts the accuracy and precision of downstream data analyses. One such quality control measure often imposed is a threshold on individual animal call rate. Different call rate thresholds have been applied across studies; little is known, however, about the impact of these thresholds on the quality of the genotype data. The objective of the present study was to investigate the effect of different call rate thresholds on the integrity of the genotypes but also to quantify the contribution of different factors to the variability in animal call rate. Data included 142,342 samples genotyped on a custom Illumina genotype panel from 141,591 dairy and beef cattle; the number of Illumina SNP on the panel was 14,371. The mean animal call rate across all samples was 99.09%; 487 animals had both a low call rate (<99%) and a subsequent high call rate (≥99%) after resampling and regenotyping. Several factors were associated (P < 0.001) with individual call rate including animal sex, the sampling herd, the date of genotyping, the genotyping plate, and the plate well. The genotype and allele concordance between the genotypes of the 487 low– and high–call rate individuals improved at a diminishing rate as mean animal call rate increased. Mean genotype and allele concordance rates of 0.987 and 0.997, respectively, existed when animal call rate was between 85 and 90%, increasing to 0.998 and 0.999, respectively, when animal call rate was between 95 and <99%. The mean within-animal allele concordance rate of rare variants (i.e., minor allele frequency < 0.05) between low and high genotype call rate animals increased when animal call rate improved; an allele concordance rate of 1.00 was achieved when animal call rate was between 85 and <99%. The accuracy of imputation of the nonobserved genotypes in the low–call rate animals improved as animal call rate increased; the mean genotype concordance rate of the imputed nonobserved SNP was 0.41 when animal call rate was <40% but increased to 0.95 when animal call rate was between 95 and <99%. Parentage validation, determined by the count of opposing homozygotes in a parent–progeny pair, was unreliable when animal call rate was <85%. Therefore, to ensure the provision of high-quality genotypes while also considering the cost and inconvenience of resampling and regenotyping, we suggest a minimum animal call rate threshold of 85%.



INTRODUCTION

The use of genomic information to aid breeding (Hayes et al., 2009; Spelman et al., 2013) and management (Kinghorn, 2012; Berry, 2015) decisions is intensifying in agriculture. Such advancements have been achievable through the development of statistical algorithms (Meuwissen et al., 2001; Habier et al., 2011) coupled with the development of genotyping panels (Matukumalli et al., 2009; Boichard et al., 2012), facilitating the generation of vast quantities of low-cost genotype information.

Data quality control measures are usually enforced prior to any analysis, and genomic analyses are no different. Noninformative SNP (i.e., SNP with a low minor allele frequency [MAF]) as well as SNP deviating from Hardy–Weinberg equilibrium or with poor call rates are generally discarded prior to data analyses (Berry and Kearney, 2009). Several studies also impose a minimum call rate at the individual level (Hayes et al., 2010; Pausch et al., 2014); such an edit is applied to ensure data integrity of the individual’s genotypes that have been called. Individuals that fail to achieve the minimum individual call rate are generally not included in the subsequent analyses and, therefore, require resampling and regenotyping, which incur both an inconvenience and cost to the breeder.

Different threshold call rates per individual have been applied across studies in cattle varying from 0.80 (Hayes et al., 2010; Boddhireddy et al., 2014) to 0.90 (Meredith et al., 2013; Chud et al., 2015) or even 0.95 (Pausch et al., 2014; Purfield et al., 2015). Few studies exist, in cattle at least, that have quantified the contribution of different factors to the variability in individual call rate but also determined the effect of call rate thresholds on genotype integrity (Cooper et al., 2013). Therefore, the aim of the present study was to determine the effect of different call rate thresholds on data quality and quantify the contribution of different factors to the variability in call rate.


MATERIALS AND METHODS

Animal Care and Use Committee approval was not obtained for this study because the data were from an existing database.

Genotypes

A total of 144,672 samples from 143,827 dairy and beef cattle, genotyped using the custom Illumina International Dairy and Beef (IDB) genotyping panel version 2 (Illumina Inc., San Diego, CA), were available. The IDB genotyping panel comprises 16,223 SNP, including 14,371 Illumina SNP chosen from either the existing Bovine SNP50 (Matukumalli et al., 2009) or high-density genotyping panels. The Illumina SNP chosen included those on the original Illumina low-density genotype panel (Boichard et al., 2012) as well as SNP chosen to aid imputation to a higher density genotype in beef cattle; the latter SNP had to have a high call rate and a good GenTrain score (i.e., high-resolution clustering of homozygotes and heterozygotes; http://www.illumina.com; Fan et al. 2003) and had to be segregating in both dairy and beef breeds predominating in Ireland. The additional Illumina SNP on the panel were chosen to aid imputation to microsatellites (McClure et al., 2013). Single nucleotide polymorphisms in genes with cited major effects (e.g., myostatin; Kambadur et al., 1997) or known to cause congenital deffects (424) as well as 1,873 SNP chosen as part of ongoing research projects were also included on the platform. Only the 14,371 Illumina SNP, excluding the 13 mitochondrial and 9 Y chromosome SNP, were retained for use in the present study.

Deoxyribonucleic acid extraction from ear tissue samples and genotyping was performed by a single company, Weatherbys Ireland Ltd. (Naas, Ireland), during the years 2014 and 2015. Genotypes were called using the GenCall (GC) method provided by Illumina in the BeadStudio/GenomeStudio software (Illumina Inc.). This genotype calling algorithm uses the fluorescently labeled nucleotides intensity values to identify which cluster the genotype for any specific locus corresponds to. The generated GC score is a confidence measure assigned to the genotype of each call for each individual and can be used to filter poor-quality calls, SNP, or samples. A GC score below 0.15 generally indicates failed genotypes, and these genotypes were declared as missing in the present study. Call rate in the present study was therefore defined as the ratio of called SNP to the total number of SNP per individual. For samples to be included in the Irish national genomic evaluations, a minimum threshold animal call rate of 90% is imposed; therefore, an additional sample was obtained and genotyped for individuals that initially failed to achieve the threshold call rate. Individual SNP call rate was estimated for all genotyped samples as the ratio of called genotypes at each SNP divided by the total number of samples.

For each sample, information was available on the date the sample was analyzed, the plate number, and the plate well the sample was in. The gender, the herd identification number, the age of the animal at genotyping, and the breed of the animal were also available from the Irish Cattle Breeding Federation database (www.icbf.com). The age of the animal at genotyping was grouped into yearly blocks with the exception of animals that were under 6 mo and greater than 10.5 yr at the time of genotyping, which were separately grouped. Genotyping dates where >15% of the samples genotyped had a call rate < 90% (4 dates) were not considered further. Call rate categories for each sample were defined as <40%, 40 to <50%, 50 to <60%, 60 to <70%, 70 to <75%, 75 to <80%, 80 to <85%, 85 to <90%, 90 to <95%, 95 to <99%, and ≥99%. Of the remaining 142,342 samples, 487 animals had both a poor call rate (<99%) and a subsequent high call rate (≥99%) after resampling and regenotyping.

Statistical Analyses

Factors associated with poorer call rate on the first genotyped sample of every animal (n = 141,591) were modeled in ASReml (Gilmour et al., 2015) assuming a multinomial distribution of the data and a cumulative logit link function. Fixed effects considered for inclusion in the model were gender of the animal (2 genders), the age of the animal at genotyping grouped as the first 6 mo and then in yearly intervals (12 age groups), herd at sampling (36,628 herds), genotyping date (88 genotyping dates), plate well (96 wells), and the sample plate (1,917 plates) nested within genotyping date. Genetic and residual variance components of individual animal call rate for the first genotyped sample of every animal (n = 141,591) was estimated using a single-trait animal linear mixed model in ASReml. All previously identified significant factors were included in the model. The pedigree of all animals was traced back to the founder population where founder animals were allocated to breed groups based on breed. The pedigree consisted of 573,061 animals and breed effects were accounted for through the use of breed genetic groups.

To quantify the impact of call rate on genotype accuracy, the mean genotype and allele concordance rate per individual was estimated for all 487 animals with both a low and subsequent high call rate. Genotype concordance per animal was defined as the number of identical SNP genotype calls (excluding those with missing genotypes) divided by the number of SNP that were called for both samples. The allele concordance rate was defined as the average proportion of correctly called alleles within an animal—a genotype called to be a heterozygote but was truly homozygote was assumed to have 1 correct allele called. The heterozygosity rate per animal was calculated on the first genotyped sample of every animal as (NO)/N, in which N was the number of nonmissing genotypes and O is the observed number of homozygous genotypes for an individual.

Imputation

The ability of imputation algorithms to accurately impute missing genotypes in samples with poor call rates was evaluated using FImpute2 version 2.2 (Sargolzaei et al., 2014) exploiting both family and population-wide–based information. Only autosomal or non-pseudoautosomal X chromosome SNP with known positions were retained, and SNP with duplicate positions were removed; 13,416 SNP remained. Imputation of the 487 low–call rate samples was undertaken using a reference population of 140,268 animals with call rates ≥ 90%. The 487 high–call rate samples were excluded from the reference population. Imputation accuracy was measured as the mean genotype and allele concordance rates per individual between imputed low–call rate animals and their original high–call rate genotypes and 2 estimates were calculated: first, imputation accuracy across just the nonobserved genotypes, and second, imputation accuracy across all SNP.

To quantify the impact of having a direct relative in the reference population (i.e., sire, dam, and progeny), individuals that were classified as a direct relative of the 487 low– and high–call rate animals were removed from the reference population. In total, 141 direct relatives of 133 low– and high–call rate animals were identified and removed, leaving 140,127 animals in the reference population. Imputation was undertaken again and genotype and allele concordance was re-estimated for all 133 animals.

Parentage

Mendelian inconsistencies were determined by counting the number of opposing homozygous SNP genotypes between a parent–progeny pair divided by the number of homozygous SNP used in the comparison. Only autosomal SNP mapped to a chromosome were used for verification (14,114 SNP). Parent/progeny genotypes of all 141,591 animals were obtained, where possible, from 3 different genotyping panels (IDB, Illumina BovineSNP50, and Illumina high-density panel). In total, 58,020 genotyped parent–progeny pairs were identified. The threshold number of Mendelian inconsistencies permitted for declaring a parent–progeny relationship inconsistent was defined using the method of Calus et al. (2011). This approach used the realized distribution of the number of opposing homozygotes in all genotyped pairs of animals that were parent–offspring according to their pedigree to define the threshold number of Mendelian inconsistencies. Based on the distribution of 58,020 genotyped pairs of animals, it was assumed that for all parent–progeny pairs with more than 75 (i.e., >0.53%) opposing homozygous loci, a conflict existed between the pedigree and SNP data and the relationship could no longer be verified (Supplementary Fig. S1; see the online version of the article at http://journalofanimalscience.org). Of the 487 low– and high–call rate animals, 332 parent–progeny relationships were identified. To ensure only verified parent–progeny relationships were used for analyses, the percentage of Mendelian inconsistencies between a parent–progeny pair were first determined using the high–call rate (≥99%) genotypes. Only relationships that were verified as true (i.e., ≤0.53% Mendelian inconsistencies between a parent–progeny pair) using the high–call rate genotypes were retained for analysis (i.e., 318 out of the 332). The mean percentage of Mendelian inconsistencies per animal was then determined using the low–call rate genotypes for each call rate category across the 318 verified parent–progeny relationships.


RESULTS

A total of 142,342 samples from 141,591 animals from 36,628 herds were available. The mean (median) call rate for all 142,342 samples was 99.09% (99.63%), varying from a minimum of 15.10% to a maximum of 99.87%. A total of 132,454 (93.55%) animals had a call rate ≥ 99% whereas only 1,656 (i.e., 1.17%) animals failed to achieve the minimum call rate threshold of 90% for inclusion into the Irish genomic evaluations (Fig. 1). As individual animal call rate decreased, the range in the heterozygosity rate per individual increased, ranging from 0.04 to 0.88 when animal call rate was <80% and from 0.25 to 0.58 when animal call rate was ≥80% (Fig. 2). The distribution of the MAF for all SNP used in the present study is in Supplementary Fig. S2 (see the online version of the article at http://journalofanimalscience.org); the median MAF across all SNP was 0.42. The mean call rate per SNP was 99.30%, whereas 61 SNP failed to achieve the standard SNP call rate threshold of ≥90% for inclusion in the Irish national genomic evaluation. As animal call rate decreased, the call rate per SNP also decreased; 32.04% of SNP had a SNP call rate < 20% when animal call rate was <40%, whereas when animal call rate was between 85 and 90%, 84.83% of SNP had a call rate between 80 and 100% (Fig. 3). The call rate per SNP also differed across all purebred breeds. Commonality across breeds, however, existed among SNP that failed to achieve a ≥90% SNP call rate; for example, 73% of SNP with a call rate of < 90% in the Hereford population also had a SNP call rate < 90% in the Charolais population (Table 1). The mean GC score per SNP increased as animal call rate increased; the mean GC score per SNP was 0.45 when animal call rate was <80%, whereas the mean GC score per SNP was 0.76 when animal call rate was between 85 and <99%.

Figure 1.
Figure 1.

Distribution of individual animal call rate for the first genotyped record on 141,591 animals.

 
Figure 2.
Figure 2.

Scatter plot of individual animal call rate against animal mean heterozygosity on the first genotyped record.

 
Figure 3.
Figure 3.

The percentage of SNP within each SNP call rate bin for each animal call rate category.

 

View Full Table | Close Full ViewTable 1.

The number of SNP that failed to achieve ≥90% SNP call rate within each pure breed population (diagonal axis) and the proportion of these failed SNP in the breed represented by each column that were also in the breed represented by each row. All animals were predicted to be >87.5% pure. The number of purebred animals with in each population were 5,556 Angus, 779 Belgian Blue, 14,388 Charolais, 3,314 Hereford, 13,243 Holstein, 15,266 Limousin, 1,235 Shorthorn, and 3,469 Simmental

 
Angus Belgian Blue Charolais Hereford Holstein Limousin Shorthorn Simmental
Angus 63 0.862 0.892 0.843 0.803 0.892 0.813 0.823
Belgian Blue 0.784 57 0.872 0.735 0.774 0.852 0.784 0.823
Charolais 0.754 0.813 53 0.735 0.735 0.892 0.725 0.794
Hereford 0.843 0.813 0.872 63 0.735 0.852 0.754 0.774
Holstein 0.813 0.862 0.892 0.754 64 0.852 0.725 0.774
Limousin 0.784 0.823 0.921 0.754 0.735 55 0.725 0.794
Shorthorn 0.833 0.882 0.872 0.764 0.725 0.843 64 0.852
Simmental 0.813 0.892 0.921 0.764 0.754 0.892 0.835 62
1≥90 SNP.
285 to <90 SNP.
380 to <85 SNP.
475 to <80 SNP.
575 to <80 SNP.

Factors Associated with Individual Animal Call Rate

Factors associated (P < 0.001) with individual animal call rate were the gender of the animal, the age of the animal at genotyping, the herd of the animal, the genotyping date, the plate well, and the sample plate. The odds of a poor call rate were 1.05 times (95% confidence interval 1.02–1.08) greater in male animals than in female animals. The odds of a poor call rate was marginally greater in animals that were less than 6 mo of age at the time of genotyping relative to an animal >10.5 yr of age (odds ratio 1.13; 95% confidence interval 1.08–1.18; Supplementary Fig. S3 [see the online version of the article at http://journalofanimalscience.org]). The likelihood of a poor call rate differed across plate wells (Supplementary Fig. S4; see the online version of the article at http://journalofanimalscience.org), with an increased likelihood of a poor call rate in 2 wells (B03 and E01). Animal call rate was lowly heritable (0.018; SE 0.004) with a genetic SD of 0.076.

Genotype Accuracy

Genotype accuracy, depicted as the mean genotype and allele concordance rate between the 487 low– and high–call rate genotypes, improved at a diminishing rate as individual animal call rate improved (Fig. 4). The greater allele concordance rate as opposed to the genotype concordance rate in animals with call rates < 80% suggests that homozygous SNP, on average, were incorrectly called as heterozygous genotypes rather than the opposite homozygote. Indeed, as animal call rate improved in the low–call rate samples, the percentage of genotype discrepancies that were mistakenly called as heterozygous genotypes instead of the opposite homozygote increased; 38.46% of genotype errors were incorrectly called as heterozygous genotypes when animal call rate was <40%, whereas 83.28% of genotype errors were incorrectly called as heterozygous genotypes when animal call rate was between 75 and <80%. When animal call rate was <85%, substantial variability in mean within-animal genotype and allele concordance was evident; mean concordance rate per animal ranged from 0.31 to 1.00. This variability among individuals was minimal when animal call rate was between 85 and <99%; mean concordance rate per animal ranged from 0.96 to 1.00. The mean within-animal allele concordance rate of rare variants (MAF < 0.05) between low and high genotype call rate animals increased as animal call rate increased, and an allele concordance rate of 1.00 was achieved when animal call rate was between 85 and <99% (Fig. 5).

Figure 4.
Figure 4.

Mean genotype (continuous line with diamond) and allele (broken line with squares) concordance rate per animal for each call rate category between low– and high–call rate genotypes per animal. Error bars represent the best and worst mean concordance rate per animal.

 
Figure 5.
Figure 5.

Mean allele concordance rate between low– and high–call rate animals by each minor allele frequency bin for each animal call rate category.

 

Imputation

The imputation accuracy of nonobserved genotypes improved as animal call rate increased; the mean genotype concordance rate of nonobserved SNP was 0.41 when animal call rate was <40%, whereas when animal call rate was between 95 and <99%, the mean genotype concordance of nonobserved SNP was 0.95 (Table 2). When animal call rate was between 80 and <85%, the range in the mean genotype concordance rate per animal across all SNP after imputation was large, ranging from 0.55 to 0.99. However, when animal call rate increased to between 85 and <90%, the variability in the mean genotype concordance rate per animal decreased dramatically, ranging from 0.95 to 1.00 (Table 2). The exclusion of a direct relative from the reference population had minimal impact on imputation accuracy; mean genotype concordance rate decreased by 0.009, from 0.690 to 0.681, when animal call rate was <85% and by 0.0007, from 0.989 to 0.988, when animal call rate was between 85 and <99%.


View Full Table | Close Full ViewTable 2.

Mean genotype and allele concordance rates (CR) per animal before and after imputation between low– and high–call rate animal genotypes for each call rate category. The minimum and maximum individual animal concordance rates are in parenthesis

 
Before imputation
After imputation
After imputation
Call rate class Number of animals Genotype CR Allele CR Genotype CR noncalled SNP Allele CR noncalled SNP Genotype CR all SNP Allele CR all SNP
<40 49 0.393 (0.313, 0.573) 0.630 (0.537, 0.770) 0.406 (0.371, 0.463) 0.642 (0.602, 0.686) 0.402 (0.358, 0.486) 0.638 (0.585, 0.708)
40 to <50 63 0.481 (0.301, 0.761) 0.691 (0.532, 0.878) 0.433 (0.375, 0.545) 0.664 (0.604, 0.750) 0.456 (0.341, 0.634) 0.677 (0.579, 0.802)
50 to <60 84 0.606 (0.311, 0.899) 0.780 (0.526, 0.949) 0.487 (0.372, 0.764) 0.707 (0.595, 0.879) 0.553 (0.341, 0.846) 0.747 (0.561, 0.922)
60 to <70 59 0.768 (0.365, 0.980) 0.877 (0.593, 0.989) 0.601 (0.385, 0.890) 0.782 (0.605, 0.944) 0.710 (0.370, 0.953) 0.844 (0.595, 0.975)
70 to <75 48 0.860 (0.490, 0.977) 0.929 (0.727, 0.988) 0.698 (0.407, 0.910) 0.838 (0.686, 0.955) 0.815 (0.485, 0.952) 0.904 (0.718, 0.976)
75 to <80 48 0.898 (0.566, 0.933) 0.949 (0.779, 0.996) 0.728 (0.052, 0.945) 0.851 (0.382, 0.970) 0.859 (0.445, 0.983) 0.926 (0.686, 0.990
80 to <85 47 0.962 (0.640, 1.000) 0.981 (0.819, 0.999) 0.880 (0.121, 0.987) 0.936 (0.535, 0.985) 0.948 (0.547, 0.994) 0.973 (0.762, 0.996)
85 to <90 80 0.987 (0.957, 1.00) 0.993 (0.978, 0.999) 0.939 (0.851, 0.987) 0.967 (0.918, 0.993) 0.981 (0.947, 0.998) 0.990 (0.973, 0.999)
90 to <95 6 0.986 (0.982, 0.991) 0.993 (0.991, 0.996) 0.941 (0.927, 0.957) 0.969 (0.961, 0.978) 0.982 (0.997, 0.988) 0.991 (0.988, 0.994)
95 to <99 3 0.998 (0.998, 0.999) 0.999 (0.998, 0.999) 0.948 (0.916, 0.969) 0.973 (0.958, 0.981) 0.997 (0.996, 0.999) 0.999 (0.998, 0.999)

Parentage

A paternal error rate of 13.28% and a maternal error rate of 10.18% were detected for all 58,020 genotyped parent–progeny pairs (Table 3); 3.13% of the animals had both parents incorrectly identified. Of the 487 low– and high–call rate animals, 318 parent–progeny relationships were identified and genomically verified as true (i.e., ≤0.53% Mendelian inconsistencies between a parent–progeny pair when parentage verification was completed using the high–call rate genotypes). When parentage verification was then completed using the low–call rate genotypes, the percentage of Mendelian inconsistencies detected between a parent–progeny pair increased as animal call rate decreased (Table 4). When animal call rate was <80%, true parent–progeny relationships could no longer be verified as an average of >0.53% Mendelian inconsistencies existed between a pair. Although an average of 0.25% Mendelian inconsistencies were detected between parent–progeny pairs when animal call rate was between 80 and 85%, the variability in the percentage of Mendelian inconsistencies detected between parent–offspring pairs compared was large, ranging from 0 to 1.23%. This variability in the percentage of Mendelian inconsistencies detected between parent–progeny pairs meant that 11.11% of parent–progeny relationships within the 80 to 85% call rate category failed to be verified. However, when animal call rate increased to 85 to 90%, all parent–progeny relationships could be verified.


View Full Table | Close Full ViewTable 3.

The number and percentage of parentage errors on all high–call rate genotypes

 
Relationship Number of relationships Parentage errors (%)
Sire incorrect 44,491 13.28
Dam incorrect 13,529 10.18
Both sire and dam incorrect 5,424 3.13

View Full Table | Close Full ViewTable 4.

The mean, SD, minimum, and maximum percentage of SNP Mendelian inconsistencies detected between a parent–progeny pair per animal call rate category

 
Call rate category Number of animals Mean SD Minimum Maximum
<40 29 27.23 6.63 12.35 39.80
40 to <50 24 19.25 11.91 3.67 40.74
50 to <60 28 10.37 9.76 1.07 43.03
60 to <70 44 4.68 6.75 0.07 28.09
70 to <75 34 0.80 1.10 0.00 4.24
75 to <80 32 0.91 1.52 0.00 6.29
80 to <85 36 0.25 0.31 0.00 1.23
85 to <90 82 0.12 0.10 0.00 0.53
90 to <95 4 0.02 0.03 0.00 0.04
95 to <99 0


DISCUSSION

The motivation for this study was to determine the minimum individual animal call rate threshold required to ensure genotype integrity for subsequent genomic analysis; based on the results from the present study, a minimum threshold of 85% is recommended. Reducing the current threshold call rate in Irish genomic evaluations from ≥90 to ≥85% will be financially favorable for the breeder, because presently, any animal failing to achieve the 90% threshold call rate must be resampled and regenotyped. Moreover, requesting a resample can frustrate farmers. Nonetheless, only 0.2% of the data was between 85 and 90% animal call rate.

Factors Associated with Call Rate

Genotype call rate across all samples was, on average, high, with only 1.17% samples failing to achieve the minimum call rate threshold of 90% for inclusion into the Irish genomic evaluations. Poor-quality DNA, which is often characterized by excessive heterozygosity, is known to contribute to low sample call rates (Anderson et al., 2010). As excessive heterozygosity (heterozyosity rate > 0.5) was detected in a large percentage of failed samples (40.99% of samples with a call rate < 85% compared with 0.12% of samples when animal call rate was ≥85%), it is possible that poor-quality DNA may have contributed to the failure of these samples. Several sources for poor-quality DNA have been well documented and include DNA contamination from other DNA sources/samples, improper sample storage, limited quantity of the DNA sample, sample degradation over time, and reduced efficiency of the DNA extraction method (Anderson et al., 2010; Flickinger et al., 2015). Deoxyribonucleic acid contamination can also occur at sampling, and the association of the herd of sampling with animal call rate variability in the present study suggests this is a contributing factor. One possibility to reduce the incidence of samples with poor call rates is to notify herds that are prone to poor sample call rates and, if necessary, provide a clearer demonstration of how the biological sample should be taken and stored.

It has been previously documented that genotype calling algorithms that apply different models to male and female samples, such as Illuminus and CRLMM, generally perform better than algorithms that do not, such as the GC algorithm used in the present study (Ritchie et al., 2011). Ritchie et al. (2011) reported that GC was marginally worse than the other genotype-calling algorithms for calling SNP on the X chromosome. In the present study, a marginal difference was detected in the mean percentage of called SNP on the X chromosome between male and female samples; the mean X chromosome call rate was 98.75 (SD 2.99) in female samples, whereas the mean X chromosome call rate was 98.12 (SD 5.79) in male samples. Therefore, the greater likelihood of a poor animal call rate in male samples in the present study may be a reflection of GC’s reduced ability to call SNP on the X chromosome in male samples.

Genotyping plate effects have also been previously reported in human studies to affect the genotype call rates (Pluzhnikov et al., 2010; Turner et al., 2011). Plate effects commonly occur when genotype calling is completed on a plate-by-plate basis, and a small number of DNA samples of poor quality alter the genotype call of the other samples on the plate. Although clustering was not undertaken in the present study on a plate-by-plate basis, it is possible that a small number of samples per plate did not conform to the standardized clustering of the GenTrain algorithm and, consequently, had poor genotype calling. It is unclear as to why an increased likelihood of a poor call rate existed in 2 wells (B03 and E01) in the present study, although a previous study by Laurie et al. (2010) also found plate well to have a significant effect on missing call rate. The heritability estimate for call rate, albeit low, in the present study is unclear and we are not aware of any other study in any other species that has attempted to estimate such.

Genotype Accuracy

The trend of improved accuracy of the called genotype with increasing animal call rate corroborates a previous study by Cooper et al. (2013) on 1,216 cattle of low and high call rates genotyped across 4 different genotyping platforms. Cooper et al. (2013), however, using the same definition of genotype concordance, documented a lower mean genotype concordance rate (0.967) than the present study (0.987) for an individual with an animal call rate between 86 and 90%, and they ultimately recommended a threshold animal call rate of 90%. As no difference was detected in genotype concordance rate between call rate categories 85 to <90% and 90 to <95% in the present study and minimal variability in the mean within-animal genotype concordance rate existed among individuals when animal call between 85 and <90%, an animal call rate threshold of 85% was recommended based on the results from the present study. Several studies have used lower animal call rate thresholds; Boddhireddy et al. (2014), for example, used a call rate of 80% and although the mean genotype accuracy in the present study at this call rate was high (genotype concordance rate 0.962), the variability in genotype accuracy among individuals was large (Table 2). Therefore, caution should be taken when interpreting results from studies with animal call rate thresholds of 80 to 85% as data quality not only influences the accuracy of genomic predictions (Edriss et al., 2013) but can also result in false-positive and false-negative results in association studies (Turner et al., 2011).

Discrepancies between genotypes called in the same high– and low–call rate individuals were increasingly called as heterozygous genotypes rather than opposing genotypes as animal call rate increased; a similar conclusion was reported by Cooper et al. (2013) in dairy cattle. Often, rare alleles are difficult to call using current genotyping algorithms due to the smaller physical size of the heterozygote and rare homozygote clusters (Anderson et al., 2010). However, as the number of rare variants being included on genotyping panels increases, it is essential that the quality of their genotype calls remains high. In the present study, high SNP call rates (mean 98.26%) were achieved for all rare variants (MAF < 0.05), although the accuracy of the genotypes called reduced when individual animal call rate was <70%. High-quality genotype calls were achieved for all variants when individual animal call rate was ≥85%, irrespective of the MAF of the variant.

Imputation

Imputation algorithms can be used to impute sporadically missing genotypes as well as correct sporadic genotyping errors (Marchini et al., 2007). Genotype imputation of nonobserved SNP genotypes has been previously shown to be poor when a large quantity of the genotype data is missing (Wang et al., 2012; Xavier et al., 2016). Xavier et al. (2016) reported that when 20% of genotypes of an individual were missing, the imputation accuracy of missing genotypes varied by imputation algorithms. In the present study, the imputation accuracy of missing genotypes was poor in low–call rate animals (call rate < 80%); this reduction in accuracy is most likely due to both the high levels of genotyping errors and missing genotypes in low–call rate animals. Therefore, when faced with large amounts of missing genotype data, the performance of the imputation algorithm should be considered. This is especially true for upcoming technologies such as genotyping-by-sequencing, as this method is commonly characterized by high rates of missing genotypes (Crossa et al., 2013).

Greater imputation accuracies have been commonly achieved when direct relatives of the target population are included in the reference population (Ma et al., 2013; Berry et al., 2014; Sargolzaei et al., 2014). In the present study, imputation accuracy increased only slightly when a direct relative was included in the reference population; the increase in accuracy was lower than expected, most likely due to the large size of the reference population. This result was supported by García-Ruiz et al. (2015), who reported that as the reference population increases, the increase in accuracy from the inclusion of a direct relative in the reference population is reduced. In addition, the high levels of genotyping errors in low–call rate animals most likely interfered with family-based phasing and imputation, as the progeny’s haplotypes did not match the parent’s haplotype.

Parentage

Parentage verification is essential to achieving genetic gain in genetic evaluation systems, as it not only affects the accuracy of animal evaluations but can also be used to reduce inbreeding (Banos et al., 2001; Dodds et al., 2005). The accurate determination of parentage from SNP genotype panels in cattle (Fisher et al., 2009; Calus et al., 2011; Hayes, 2011) has resulted in SNP replacing the traditional, generally more expensive, microsatellite marker method of verification. Although a relatively large pedigree error rate, particularly of the sire (13.28%), was detected in the present study, this estimate is line with international studies of dairy cattle (Visscher et al., 2002; Weller et al., 2004; Sanders et al., 2006). Several reasons for such pedigree errors have been previously documented by Christensen et al. (1982) and include 1) the insemination of cows pregnant from a previous insemination, 2) natural-service bulls impregnating previously inseminated cows that were incorrectly assumed to be pregnant 3) mistakes by AI company in labeling semen, 4) AI technicians incorrectly identifying semen samples, 5) errors when entering the bull’s herdbook number/name into the insemination record, 6) mix-up of calves or samples at birth, and 7) potential errors at the lab level. However, as the number of genotyped animals continues to increase, the ability to identify and correct pedigree errors is possible (McClure et al., 2015). By choosing an animal call rate threshold of 85%, the maximum number of genotyped animals can be included in the pedigree validation or invalidationprocess.

Conclusions

Animal call rate impacts genotype integrity. Factors associated with individual animal call rate included the sex and age of the animal, the herd at sampling, the genotyping date, the sample plate, and the plate well. Imputation accuracy of missing genotypes in low–call rate animals was poor, most likely due to the high levels of genotyping errors and missing genotypes. Minimal genotyping errors were detected when animal call was ≥85% and parentage verification across genotyped parent–progeny relationships was reliable at this threshold. Therefore, to account for the cost of resampling and regenotyping while ensuring the maximum number of accurate genotypes is included in subsequent analysis, we suggest a minimum animal call rate threshold ≥ 85%.

 

References

Footnotes



Files:

Comments
Be the first to comment.



Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.