1st Page

Journal of Animal Science - Animal Genetics

Technical note: Updating the inverse of the genomic relationship matrix1


This article in JAS

  1. Vol. 91 No. 6, p. 2583-2586
    Received: Nov 6, 2012
    Accepted: Mar 7, 2013
    Published: November 25, 2014

    2 Corresponding author(s):

  1. K. Meyer 2,
  2. B. Tier and
  3. H.-U. Graser
  1. Animal Genetics and Breeding Unit3, University of New England, Armidale New South Wales 2351, Australia


A computing strategy to update the inverse of the genomic relationship matrix when new genotypes become available is described. It is shown that re-using results of previous computations can result in substantial reductions in computing time required. For instance, when the number of individuals increased by about 1% for matrices larger than 15,000, the time required for updating was less than 7% of that used for direct inversion from scratch.


Use of genomic information in genetic evaluation by replacing the standard, additive genetic relationship matrix with the genomic or “realized” relationship matrix (GRM) is becoming a standard procedure in livestock improvement schemes. This includes the so-called genomic BLUP and the single step method, which allows information on all phenotyped and genotyped individuals to be used simultaneously (Aguilar et al., 2010). If symmetry of the coefficient matrix in the pertaining mixed model equations is to be retained, both require the inverse of the GRM, which is a dense matrix without known structure. Hence, the inverse needs to be computed explicitly. This involves computations proportional to the cubic power of the number of genotyped individuals and can present a considerable computational burden.

Typically, a small number of additional genotypes, compared with the total number, becomes available between rounds of genetic evaluation. This suggests that computational requirements to invert the GRM can be reduced by using computations saved from the previous round. This note describes a strategy to do so and illustrates the magnitude of reduction in computational effort that can be achieved.

Updating the inverse

Strategies to update the inverse of a matrix directly are available for the case where the modification of the matrix is given by a low rank update. For instance, if B = AA′ and we augment A by adding a column a, B is replaced by B + aa′. For B–1 known, the new inverse is then given by (B + aa′)–1 = B–1 – αB–1aaB–1, with α = (1 + aB–1a)–1 (Hager, 1989). The GRM is generally obtained as ZZ′/s or similar, in which Z denotes the matrix of allele counts with number of rows equal to the number of individuals and s a scaling factor (Van Raden, 2008). Hence, such methods might be applicable if the number of genotypes remained constant and the density of genomic information (number of SNP) increased. However, with the number of genotypes growing, more general relationships based on partitioned matrix results need to be exploited.

Assume the GRM, denoted by G, has full rank or has been modified to ensure that G is “safely” positive definite. A common strategy to invert a dense, symmetric positive definite matrix efficiently involves determining the Cholesky factorization of the matrix G = LL′, with L the Cholesky factor, as the first step. The lower triangular matrix L is then inverted, which yields L–1 of the same form. Finally, this inverse is multiplied with its transpose, yielding G–1 = LTL–1 (with LT denoting the transpose of L–1).

Partition G, of size n × n, into a part G11 due to n1 genotypes known previously and parts G21 and G22 due to n2 new individuals genotyped, and partition its Cholesky factor correspondingly:

Partitioned matrix results then give inverse matrices

Computing Strategy

Assume further that is known (i.e., that any modifications to make G positive definite have not altered G11) and that the inverse of its Cholesky factor,, has been saved. Computing G–1 re-using these previous results then involves these steps:

  • 1. Complete the Cholesky factorization of G: With known, the off-diagonal block of L is simply obtained as which gives . The latter matrix operation represents a so-called symmetric rank update of G22, involving n1n2(n2 + 1)/2 multiplications. Factorization of G22.1 then yields L22.

  • 2. Construct the inverse Cholesky factor L–1: With available, the only inverse to be calculated is that of L22, reducing the number of operations required from those proportional to n3 (to invert L) to a value proportional to . The off-diagonal block of L–1 is then evaluated as exploiting the triangular nature of both and .

  • 3. Compute the inverse matrix G–1: The upper diagonal block of G–1 is determined by adjusting the previous inverse for the covariances between old and new genotypes, a rank n2 update of (n2n1(n1 + 1)/2 multiplications). The off-diagonal blocks of G–1 are then obtained by premultiplying L21 with , again accounting for its triangularity (n2n1(n2 + 1)/2 multiplications),

Finally, the lower diagonal block of G1 is calculated as which involves n2(n2 + 1)(n2 + 2)/6 multiplications.

Alternatively, as we can obtain G–1 aswith and appropriate modifications of the steps above. This eliminates the need to form and store L1 explicitly but replaces multiplications by triangular matrices, and ,with that of “full” matrices, and G22 (increasing the number of multiplications required for the product of 2 matrices of sizes n × n and n × m, respectively, from mn(n + 1)/2 to mn2).


To evaluate the performance of the proposed strategy, a dense matrix G of size 27,000 was assembled by first creating the pedigree based numerator relationship matrix for 1,080 independent families of size 25, with each family comprising a sire mated to 4 unrelated dams and 5 progeny per dam. To mimic the structure of a GRM, a random number between –0.001 and 0.001 sampled from a uniform distribution was then added to each of the off-diagonal elements. Appropriate submatrices were extracted to evaluate inverses of size n = n1 + n2 for various combinations of n1 and n2.

The strategy described was implemented in a fortran 95 program (, using double precision calculations throughout and “packed” storage, that is, storing only 1 triangle of symmetric matrices in a 1-dimensional array, with all read and write operations in binary form. This was done in a “stand alone” version not using any statistical libraries (version A), using right looking algorithms for factorization and inversion, and 2 versions (B1 and B2) using routines from the LAPACK (Anderson et al., 1999) and BLAS (Dongarra et al., 1988; Blackford et al., 2002) libraries. In particular, the LAPACK routines chosen used the rectangular fully packed format to store 1 triangle of the matrices only in a 1-dimensional array (Gustavson et al., 2010). This format allows blocked factorization and level 3 BLAS routines to be exploited. Further details on the routines used are given in the Appendix. Preliminary investigations had shown that the alternative updating strategy not using L1 required longer computing times throughout, and this option was therefore not pursued further.

To store the elements of 1 triangle of G or L and G11 or L11 (or their respective inverses) in packed format requires memory for n(n + 1)/2 and n1(n1 + 1)/2 variables. In addition a work vector of length n and 2 arrays of dimensions n2 × n1 and n2 × n2, respectively, were used. The latter 2 were needed as the BLAS matrix multiplication routines require matrices to be stored in 2-dimensional arrays. For comparability, both versions A and B1 used this amount of work space even though computations could have been performed “in place” with less work space using packed storage for A. Version B2 used an additional array of size n1 × n1 to allow more computations to be performed using BLAS level 3 routines. For inversion “from scratch” (n1 = 0), calculation and writing out of the intermediate file containing L1 was omitted unless the inversion was to be followed by updating steps.

All versions were compiled for sequential execution using the intel fortran compiler 13.0.1 (Intel Corporation, Santa Clara, CA). Optimized versions of the LAPACK and BLAS routines were loaded from the intel math kernel library, MKL 11.0. All calculations were performed on a desktop computer equipped with a second generation intel I7 processor (I7-3930K; Intel Corporation) rated at a speed of 3.2 GHz and with a cache size of 12 MB.


The performance of the updating method was evaluated by repeated updating of the inverse, starting at n1 = 2,000 and using steps of n2 = 200 and n2 = 500, respectively. Computing times required for updating and direct inversion without any intermediate steps are contrasted in Fig. 1. For the simple implementation without library routines, updating was highly beneficial, reducing the time for inversion of a matrix of size 25,000 from 93 to fewer than 6 min when updating from size n1 = 24,500 to n = 25,000. Using BLAS and LAPACK routines resulted in speed-ups of the direct inversion almost by a factor of 8, reducing the time for inversion of a matrix of size 25,000 to under 12 min. This was due to the highly optimized and cache efficient nature of the LAPACK routines used for factorization and inversion. This decreased the relative efficacy of the updating approach, especially for version B1, which did not use additional memory for a work array of size n1 × n1. Updating times for versions A and B1 were virtually identical for both step sizes. For version B2, updating from 24,500 to 25,000 required 18 s (i.e., less than a fraction of one-fourteenth of the time for direct inversion and just under one-sixth of the time used by versions A and B1). However, this needed 9.2 GB of memory, compared with 4.7 GB for the latter.

Figure 1.
Figure 1.

Time required to compute the inverse of a matrix of size n × n, using inversion “from scratch” (▬ • ▬ •) and updating (from nn2 to n) in steps of n2 = 200 (▬▬▬▬) and n2= 500(▬ ▬ ▬). A = Standalone version, B1 = version using BLAS and LAPACK versions, B2 = as B1 but using additional workspace. See online version for figure in color.


Updating was most advantageous for large matrices and when proportions of new individuals were small. This is further illustrated in Table 1, with results given obtained using version B2. For updating to use less than 30% of the time for direct inversion, the proportion of individuals to be added needed to be less than about 10%. Reductions to 10% or less were achieved for proportions smaller than about 2%.

View Full Table | Close Full ViewTable 1.

Relative time to compute the inverse of a matrix of size n × n by updating an inverse of size n1 × n1 for n2 individuals added (n = n1 + n2), expressed as proportion (in %) of time required for direct inversion of a matrix of the same size

New1 Number of “old” animals (n1), in thousands
n2 5 10 15 20 25
50 13 6 4 3 3
100 15 8 5 4 3
200 20 11 7 6 4
500 32 18 12 9 8
1000 47 28 20 16 13
2000 66 44 33 26 32
Time2 6 45 150 353 680
1Number of new animals.
2Computing time, in seconds, required for direct inversion of old matrix of size n1 (no intermediate steps).


Results clearly demonstrate that substantial savings in computational requirements can be achieved by “updating” the inverse of the GRM as new genotypes become available instead of inverting the new matrix from scratch. Not surprisingly, the strategy proposed is most advantageous when the proportion of individuals added is comparatively small.

As reported earlier by Aguilar et al. (2011), with large, dense matrices to be manipulated, optimization of operations and memory access as provided in BLAS and LAPACK library routines can increase the speed of computations dramatically. In our case, inversion “from scratch” as implemented with version A took almost 8 times as long as with versions B. Although not considered here, these routines also allow multiple processors to be exploited readily, which has the scope to further reduce computational times required markedly. For instance, Aguilar et al. (2011) achieved speed-ups in matrix multiplications by a factor of more than 3 when using 4 processors.

The price for computational savings through updating with the strategy proposed is the need to retain not only G1 but also L1 between rounds of genetic evaluation. For large populations, this may require substantial storage capacity, but disk space is readily available and relatively cheap. We assumed that G11 (and therefore and ) remained unchanged as new individuals became available. In practice, a change of allele frequencies may necessitate multiplication by a scale factor, which is readily accommodated. Other specific changes, such as a low order rank update of G, might be incorporated using Hager’s (1989) procedure described above. However, if G11 has been “tweaked” previously, for example, by adding a multiple of the identity matrix or by blending with its pedigree based equivalent, it seems reasonable to assume that this is not needed again and that only the new parts, G22 and G21, would be modified.

Very large problems may still require use of techniques to approximate the inverse of the GRM, such as recently proposed by Faux et al. (2012), or need to rely on (nonsymmetric) mixed model formulations without such inverse (see Misztal et al., 2009; Legarra and Ducrocq, 2012). However, as shown, use of efficient statistical library routines together with the strategy described can substantially extend the limits for the size of GRM matrices that can readily be inverted on a routine basis.


The method described can reduce the computational requirements to invert the genomic relationship matrix when additional genotypes become available by orders of magnitude. It is especially suited to scenarios where breeding values are estimated frequently and relatively small numbers are added between analyses.


Cholesky factorization and inversion of the Cholesky factor when “updating” were performed using LAPACK routines dpftrf and dtftri. To invert the genomic or “realized” relationship matrix (GRM) from scratch without intermediate steps, dpftrf was combined with routine dpftri, with all 3 routines using the rectangular fully packed format. Multiplications by triangular matrix (or its transpose) to compute L21 and G21 were performed with routine dtrmm. Adjustments to G22 to determine G22.1 could be obtained as a rank-k update for a symmetric matrix using dsyrk, and the product was computed with routine dlauum. For version B2, which used additional work space to store or in the lower diagonal block of a 2-dimensional array of size n1 × n1, routines dtrmm and dsyrk were also used to compute product and rank-k update ,respectively. Additional BLAS routines used were ddot, dcopy, dsca, daxpy and dtfttp, dtpttf, dtfttr, dtpttr, and dtrttp for conversion between matrices storage formats.




Be the first to comment.

Please log in to post a comment.
*Society members, certified professionals, and authors are permitted to comment.