Appendix

Input controlling file structure

This lists all possible inputs which can be listed in inp.yaml. Some options may be omitted, in which case default values are chosen. The type columns follow the Python typing conventions.

Salted definition (inp.salted)

var name	type	usage
`saltedname`	`str`	A label to identify a particular training setup
`saltedpath`	`str`	Location of all files produced by SALTED

System definition (inp.system)

var name	type	usage
`filename`	`str`	An `XYZ` file consisting of input structures
`species`	`List[str]`	Ordered list of element species
`average`	`bool`	Whether we use averaged coefficients to set an offset. Normally this should be true, unless a delta-density is learned.
`parallel`	`bool`	Whether to use MPI parallelization
`field`	`bool`	Option for using external field. For predicting densities without external fields, set to `False`

Information about QM training set generation (inp.qm)

var name	type	usage
`path2qm`	`str`	Location of training data
`qmcode`	`Union[Literal["aims"], Literal["cp2k"], Literal["pyscf"]]`	Which ab initio software was used to generate training data.
`qmbasis`	`str`	Basis set to use when generating the training data (unused by AIMS)
`dfbasis`	`str`	A label for the auxiliary basis set used to expand the density

Rascaline atomic environment parameters (inp.descriptor.rep[n])

[n] below stands for nth local environment. E.g. rep[n] should be rep1 and rep2 for the first and second local environment respectively. See SOAP descriptions for more details.

var name	type	usage
`type`	`Union[Literal["rho"], Literal["V"]]`	Representation type, `"rho"` for atomic density and `"V"` for atomic potential
`rcut`	`float`	Radial cutoff (Angstrom)
`nrad`	`int`	Number of radial functions
`nang`	`int`	Number of angular functions
`sig`	`float`	Gaussian function width (Angstrom)
`neighspe`	`List[str]`	Ordered list of atomic species

Feature sparsification parameters (inp.descriptor.sparsify)

var name	type	usage
`nsamples`	`int`	Number of structures to use for feature sparsification
`ncut`	`int`	Sets maximum number of SOAP features kept.

Prediction variables (inp.prediction)

var name	type	usage
`filename`	`str`	An `XYZ` file consisting of structures whose densities we wish to predict
`predname`	`str`	A label to identify a particular set of predictions
`predict_data`	`str`	Path to ab initio output for prediction, relative to `path2qm`

ML variables (inp.gpr)

var name	type	usage
`z`	`float`	Kernel exponent \(\zeta\) for GPR
`Menv`	`int`	Number of reference environments
`Ntrain`	`int`	Number of training structures
`trainfrac`	`float`	Training dataset fraction. Training dataset size is \(\mathrm{Ntrain}\times\mathrm{trainfrac}\)
`regul`	`float`	Regularization parameter \(\eta\) for GPR
`eigcut`	`float`	Eigenvalues cutoff for RKHS projection
`gradtol`	`float`	Minimum gradient norm tolerance for CG minimization
`restart`	`bool`	Whether to restart from previous minimization checkpoint
`blocksize`	`int`	Divide dataset into blocks with `blocksize` for MPI matrix inversion
`trainsel`	`Union[Literal["sequential"], Literal["random"]]`	Train at random or sequentially for MPI matrix inversion