Skip to content

Appendix

Input controlling file structure

This lists all possible inputs which can be listed in inp.yaml. Some options may be omitted, in which case default values are chosen. The type columns follow the Python typing conventions.

Salted definition (inp.salted)

var name type usage
saltedname str A label to identify a particular training setup
saltedpath str Location of all files produced by SALTED

System definition (inp.system)

var name type usage
filename str An XYZ file consisting of input structures
species List[str] Ordered list of element species
average bool Whether we use averaged coefficients to set an offset. Normally this should be true, unless a delta-density is learned.
parallel bool Whether to use MPI parallelization
field bool Option for using external field. For predicting densities without external fields, set to False

Information about QM training set generation (inp.qm)

var name type usage
path2qm str Location of training data
qmcode Union[Literal["aims"], Literal["cp2k"], Literal["pyscf"]] Which ab initio software was used to generate training data.
qmbasis str Basis set to use when generating the training data (unused by AIMS)
dfbasis str A label for the auxiliary basis set used to expand the density

Rascaline atomic environment parameters (inp.descriptor.rep[n])

[n] below stands for nth local environment. E.g. rep[n] should be rep1 and rep2 for the first and second local environment respectively. See SOAP descriptions for more details.

var name type usage
type Union[Literal["rho"], Literal["V"]] Representation type, "rho" for atomic density and "V" for atomic potential
rcut float Radial cutoff (Angstrom)
nrad int Number of radial functions
nang int Number of angular functions
sig float Gaussian function width (Angstrom)
neighspe List[str] Ordered list of atomic species

Feature sparsification parameters (inp.descriptor.sparsify)

var name type usage
nsamples int Number of structures to use for feature sparsification
ncut int Sets maximum number of SOAP features kept.

Prediction variables (inp.prediction)

var name type usage
filename str An XYZ file consisting of structures whose densities we wish to predict
predname str A label to identify a particular set of predictions
predict_data str Path to ab initio output for prediction, relative to path2qm

ML variables (inp.gpr)

var name type usage
z float Kernel exponent \(\zeta\) for GPR
Menv int Number of reference environments
Ntrain int Number of training structures
trainfrac float Training dataset fraction. Training dataset size is \(\mathrm{Ntrain}\times\mathrm{trainfrac}\)
regul float Regularization parameter \(\eta\) for GPR
eigcut float Eigenvalues cutoff for RKHS projection
gradtol float Minimum gradient norm tolerance for CG minimization
restart bool Whether to restart from previous minimization checkpoint
blocksize int Divide dataset into blocks with blocksize for MPI matrix inversion
trainsel Union[Literal["sequential"], Literal["random"]] Train at random or sequentially for MPI matrix inversion