Skip to content

Theory

This section provides a brief primer on the theoretical background to SALTED, for users to get familiarized with the methods before doing the exercise. For a full description, please see the theory page of the SALTED documentation.

Density fitting method

The density fitting (DF) method or resolution of the identity (RI) ansatz, are commonly used in quantum chemistry. Several metrics can be chosen to perform this fitting. It assumes that the electron density \(\rho(\mathbf{r})\) can be expanded as a linear combination of auxiliary basis functions (in FHI-aims, we use the product basis functions which are also used in all hybrid-functional and Hartree-Fock calculations):

\[ \begin{aligned} \rho(\mathbf{r}) \approx \tilde{\rho}(\mathbf{r}) &= \sum\limits_{i,\sigma,\mathbf{U}} c_{i,\sigma} \phi_{i,\sigma} (\mathbf{r} - \mathbf{R}_{i} + \mathbf{T}(\mathbf{U})) \\ &= \sum\limits_{i,\sigma,\mathbf{U}} c_{i,\sigma} \left\langle\mathbf{r} \middle| \phi_{i,\sigma}(\mathbf{U}) \right\rangle. \\ \end{aligned} \]

Here \(i\) indicates the index of the atom on which the auxiliary basis function \(\phi_{i,\sigma}\) is centered, \(\sigma = n \lambda \mu\) is a composite index with \(n\) labelling the radial component of the auxiliary basis function and \(\lambda \mu\) corresponding to the spherical harmonics \(Y_{\lambda}^{\mu}\) which describes the angular part of the auxiliary basis function.

Symmetry-adapted descriptors and kernels

SALTED uses the \(\lambda\)-SOAP power spectrum as descriptors of the atomic environments used to learn and predict electron densities. These are a modified version of the Smooth Overlap of Atom Position (SOAP) descriptors, which builds rotational equivariance of order \(\lambda\) into the power spectrum. Linear kernel functions are then constructed from the dot product of two symmetry-adapted descriptors; non-linear kernels can then be constructed by multiplying these linear kernels by their scalar (\(\lambda=0\)) counterpart. These kernels are written as \(k_{\mu\mu'}^{\lambda}(A_{i},M_{j})\), where \(A_{i}\) is the atomic environment of atom \(i\), \(M_{j}\) is atomic environment of atom \(j\) in the set of reference atomic environments (which are introduced in the next section), and \(\delta_{a_{i}, a_{j}}\) is the Kronecker delta function that makes sure the atomic species of atom \(i\) and atom \(j\) are the same. For more details, please check this paper.

The Subset of Regressors

In order to reduce the dimensionsionality of the machine learning problem, we use the subset of regressors approach (SoR) to approximate the full kernel matrix (\(N\) samples) using a subset of \(M < N\) atomic environments:

\[ K_{NN} \approx K_{NM} K_{MM}^{-1} K_{MN} \]

These \(M\) environments are selected using farthest point sampling, a simple but useful greedy algorithm to select a diverse subset of points from a given set. Please check, e.g., this blog for details.

Symmetry-adapted prediction

Within the sparse-GPR formalism, the density coefficients \(c_{i, n\lambda\mu}\) are expressed as a linear combination of the kernel functions

\[ c_{i, n\lambda\mu} = c_{n\lambda\mu} (A_{i}) \approx \sum\limits_{j \in M} \sum\limits_{|\mu'| \le \lambda} b_{n\lambda\mu'} (M_{j}) k_{\mu\mu'}^{\lambda}(A_{i},M_{j}) \delta_{a_{i}, a_{j}} \]

where \(b_{n\lambda\mu'}\) are the regression weights, \(j\) runs over the sparse selection of \(M\) atomic environments, and the Kronecker delta function ensures that atoms \(i\) and \(j\) are of the same species.

Reproducing kernel Hilbert space (RKHS)

Even though the sparsification simplifies the kernel matrix, it still involves calculating \(\mathbf{K}_{MM}^{-1}\), which can be both expensive and ill-conditioned. Following the representer theorem and assuming we have a positive-definite kernel (which is the case, also after sparsification), we are guaranteed to find a Hilbert space with elements \(\mathbf{\Psi}\) whose dot product reproduces the kernel. This is called the reproducing kernel Hilbert space (RKHS).

By identifying the elements of this Hilbert space which satisfy

\[ \mathbf{K}_{NN}^{an\lambda} \approx \mathbf{\Psi}_{ND}^{an\lambda} (\mathbf{\Psi}_{ND}^{an\lambda})^{\top} \]

we are able to rewrite our expression for the predicted density coefficients as linear regression task in terms of these feature vectors:

\[ c_{an\lambda\mu} (A_i) \approx \mathbf{\Psi}_{ND}^{an\lambda} (A_i) \tilde{\mathbf{b}}_{D}^{an\lambda} \]

In doing so we further truncate the dimensionality \(D\) of the problem. For a detailed derivation of this expression, please refer to either the SALTED documentation, or this paper.

GPR optimization

In order to find the regression weights used to predict the density expansion coefficients, the following loss function must be minimized:

\[ l(\tilde{\mathbf{b}}_{D}) = (\mathbf{\Psi}_{ND} \tilde{\mathbf{b}}_D - \textbf{c}_N^{\text{DF}})^{T}\mathbf{S}_{NN}(\mathbf{\Psi}_{ND} \tilde{\mathbf{b}}_D - \textbf{c}_N^{\text{DF}}) + \eta \tilde{\mathbf{b}}_{D}^{T} \tilde{\mathbf{b}}_{D} \]

Here \(\mathbf{S}_{NN}\) is the block diagonal overlap matrix between the auxiliary basis functions, \(\eta\) is a regularization parameter which acts as a penalty to high-norm weights, and \(\textbf{c}_N^{\text{DF}}\) is the vector of reference density-fitting coefficients associated with the training configurations.

Direct inversion

The loss fuction can be minimized analytically, yielding the following expression for the optimized regression weights \(\mathbf{b}\):

\[ \tilde{\mathbf{b}}_D = \left(\mathbf{\Psi}_{ND}^T \cdot \mathbf{S}_{NN} \cdot \mathbf{\Psi}_{ND} + \eta \mathbf{1}_{DD} \right)^{-1} \left(\mathbf{\Psi}_{ND}^T \cdot \textbf{c}_N^{\text{DF}}\right). \]

Solving this equation requires the inversion of a \(D \times D\) matrix.

Conjugate gradient method

If a direct inversion is not feasible, the loss function can be minimized directly by applying the conjugate gradient (CG) method. This is discussed in this paper, and details of the CG algorithm can be found at Wikipedia.