Workflow and First Steps
SALTED workflow
- Train SAGPR model (Part 2: setup dataset and Part 3: learn the density)
- Calculate electron density and density fitting (DF) coefficients by FHI-aims.
- Generate (optionally sparse) \(\lambda\)-SOAP descriptors by rascaline, and sparsify the atomic environments by farthest point sampling (FPS) method.
- Calculate RKHS related quantities, including kernel matrix \(\mathbf{K}_{MM}\), associated projectors, and the feature vector \(\mathbf{\Psi}_{ND}\).
- Optimize GPR weights by either direct inversion or CG method, and save the optimized weights.
- Validate the model if necessary.
- Predict the density and calculate derived properties of new structures (Part 4: predict properties)
- Predict density fitting coefficients using the GPR weights obtained in the previous step. Save the predicted density coefficients.
- Read the predicted density coefficients and density fitting coefficients with FHI-aims and run one diagonalization of the KS Hamiltonian (\(\rho \rightarrow H_{KS} \rightarrow \text{everything}\))
- Parse the output files of AIMS for derived properties, e.g. total/XC/electrostatic energy, forces, etc. Note that some quantities which depend only on the density (e.g. dipoles, electrostatic energy, etc.) do not require the diagonalization above.
Overview before starting
Related starting files
file or dir name | description |
---|---|
README.rst |
README file for your reference |
inp.yaml |
SALTED input file, consists of file paths and hyperparameters |
control.in |
FHI-aims control file, for generating dataset with FHI-aims |
run-aims.sbatch |
Example script for generating dataset with FHI-aims (only an example, everything can be run locally) |
water_monomers_100.xyz |
xyz file, training dataset. Units: Angstrom Å |
We are going to
- Generate a dataset using FHI-aims.
- Record atomic basis information.
- Benchmark training dataset.
No MPI interface?
Don't worry!
Just change the parameter parallel
in inp.yaml
to False
.
For all MPI \(\otimes\) Python commands in this tutorial, e.g.
mpirun -np $ntasks python -m salted.aims.move_data
just run without mpirun
:
python -m salted.aims.move_data
Input geometry
For the whole project, we will stay at working root dir example/water_monomer_AIMS
,
where inp.yaml
is located.
cd $path_to_salted_examples/water_monomer_AIMS
Before we start, check the filename
entry in inp.yaml
and make sure it is water_monomers_100.xyz
.
This is the training dataset.
About inp.yaml
The inp.yaml
consists of file paths, hyperparameters for machine-learning models, and controlling arguments.
The parameters will be indicated by inp.[parameter]
in the following text.
For a detailed description of each parameter, please refer to the Appendix.
FHI-aims needs a control.in
file and a geometry.in
file to start a calculation. The control.in
file is always the same, and the geometry.in
file is generated by salted.aims.make_geoms
from water_monomers_100.xyz
:
python -m salted.aims.make_geoms
This will generate all 100 water monomers' input files in "[inp.qm.path2qm]/data/geoms/[n].in"="qmdata/data/geoms/[n].in"
with [n]
running from 1 to 100. Later, these files will be moved into each FHI-aims calculation dir and renamed to geometry.in
.
Control file
In the provided control.in
file, there are 3 special tags for SALTED:
ri_density_restart write
ri_full_output .True.
ri_output_density_only .True.
Control tags ri_xxx
and options details (click to expand)
You can find more details about these tags and options in the FHI-aims PDF documentation.
ri_density_restart [task] [value]
- If
[task]
iswrite
[value]
is an optional positive float which defines a cutoff radius for calculating the overlap of the product basis functions. Default[value]
is1.5
.- FHI-aims will write restart coefficients to a file
ri_restart_coeffs.out
, which should be renamed tori_restart_coeffs_df.out
manually and will be used by SALTED later.
- If
[task]
isread
value
is an optional non-negative integer specifying the maximum number of SCF steps to be performed after reading the density. Default[value]
is the the value of tagsc_iter_limit
, which is by default1000
.
- If
[task]
isread_and_write
- FHI-aims will do both
read
andwrite
.
- FHI-aims will do both
- Default: do nothing.
- If
ri_full_output [boolean]
- If
[boolean]
is.True.
andri_density_restart
iswrite
- FHI-aims will write (by
subroutine output_idx_info()
)- overlap matrix to file
ri_ovlp.out
, and density projection to filesri_projections.out
- information about the product basis needed to interface with the SALTED framework to file
prodbas_condon_shotley_list.out
,idx_prodbas.out
,basis_info.out
,idx_prodbas_details.out
- density and partition table on the internal real-space grid used by FHI-aims to file
partition_tab.out
.
- overlap matrix to file
- FHI-aims will write (by
- If
[boolean]
is.False.
(default)- FHI-aims will not write the above files.
- If
ri_output_density_only [boolean]
- If
[boolean]
is.True.
- FHI-aims will write the density to file
rho_scf.out
(atscf_solver.f90/write_rho()
).
- FHI-aims will write the density to file
- If
[boolean]
is.False.
(default)- FHI-aims will not write the above file.
- If
If you need help with FHI-aims (click to expand)
For basics of using FHI-aims, please refer to this tutorial: Basics of Running FHI-aims.
For FHI-aims full user guide, please download the latest PDF documentation here, or search for a compatible version for your FHI-aims. You can literally find everything about FHI-aims in this PDF.
Run FHI-aims
To run FHI-aims calculation, please write your own script in reference to run-aims.sbatch
(this is merely an example) based on your cluster/PC,
and then run the script. If you are using an HPC (not necessary for this exercise) the command would be:
sbatch run-aims.sbatch
otherwise, something like
bash run-aims.sh
will suffice. If you struggle to adapt this script to your needs, contact one of the people responsible for this tutorial.
For each FHI-aims calculation, the script should
- Copy
control.in
(from working root dir) and eachgeometry.in
(from[inp.qm.path2qm]/data/geoms="qmdata/data/geoms"
) to the working dir. - Run FHI-aims
- Rename
rho_rebuilt_ri.out
torho_df.out
, renameri_restart_coeffs.out
tori_restart_coeffs_df.out
Why rename outputs?
This is for the sake of distinguishing training and prediction.
There will be two kinds of rebuilt densities / DF coefficients: either from FHI-aims for training, or from SALTED for prediction (see Part 4).
- The current
rho_restart_coeffs.out
are the DF coefficients, and they are used for training SALTED. So it is suffixed by_df
- The current
rho_rebuilt_ri.out
is the rebuilt real-space density from the DF mentioned above, so it is also re-suffixed by_df
. This file will be later used for benchmarking DF accuracy.
Output files
After the FHI-aims calculation, each FHI-aims working dirs ([inp.qm.path2qm]/data/geoms/[idx]="qmdata/data/[idx]"
) contains
file name | description | FHI-aims tag |
---|---|---|
FHI-aims basic files | ||
control.in |
moved in by script, from control.in in working root dir |
--- |
geometry.in |
moved in by script, from qmdata/data/geoms/[idx] |
--- |
aims.out |
FHI-aims output | --- |
real-space density related outputs | ||
partition_tab.out |
space grid in FHI-aims integral | ri_full_output .True. |
rho_df.out |
renamed by script from rho_rebuilt_ri.out , df for density fitting |
--- |
(rho_rebuilt_ri.out ) |
reconstructed real-space electron density by RI/DF, columns=(x,y,z,density) , renamed by script to rho_df.out |
ri_full_output .True. |
rho_scf.out |
real-space electron density, columns=(x,y,z,density) |
ri_output_density_only .True. -> ri_output_density = .True. |
RI / DF related outputs | (Notice: \(\text{RI} \Leftrightarrow \text{DF}\) refer to the same procedure) | |
ri_ovlp.out |
overlap matrix of DF basis \(\mathbf{S}_{NN}\) | ri_full_output .True. |
ri_restart_coeffs_df.out |
renamed by script from ri_restart_coeffs.out , df for DF |
--- |
(ri_restart_coeffs.out ) |
DF coefficients (\(\mathbf{c}_{N}^{DF}\)), renamed by script to ri_restart_coeffs_df.out |
ri_density_restart write |
ri_projections.out |
\(\mathbf{S}_{NN} \mathbf{c}_{N}^{DF} = \left\langle \phi_{i,\sigma} (\mathbf{0}) \middle\| \rho^{QM} \right\rangle\), projected electron density | ri_full_output .True. |
product basis related outputs | ||
basis_info.out |
general information about the product basis | ri_full_output .True. |
Collecting AIMS outputs
The RI outputs from each AIMS calculation need to be collected into a single folder, and are converted into numpy format to speed up reading these quantities in further steps. For that, we will run:
mpirun -np $ntasks python -m salted.aims.move_data
where $ntasks
needs to be substituted by the amount of tasks you wish to use in your machine.
Three directories, namely overlaps
, projections
, coefficients
are generated at the working root dir, consisting of the collected data from FHI-aims output files.
data name | physics quantity | from file (in dir qmdata/data/[n] ) |
to file (at working root) |
---|---|---|---|
overlap matrices | \(\mathbf{S}_{NN}\) | ri_ovlp.out |
overlaps/overlap_conf[n].npy |
DF projections | \(\mathbf{S}_{NN} \mathbf{c}_{N}^{DF}\) | ri_projections.out |
projections/projections_conf[n].npy |
DF coefficients | \(\mathbf{c}_{N}^{DF}\) | ri_restart_coeffs_df.out |
coefficients/coefficients_conf[n].npy |
Notice that n
ranges from 1 to 100 across the training dataset, and overlap naming convention is overlap_conf[n].npy
.
Reordering coefficients (AIMS version < 240403)
Due to different spherical harmonics conventions, the overlap matrix / DF projection / DF coefficients should be reordered and the Condon-Shottley convention should be applied (includes the Condon-Shottley phase factor \((-1)^m\) in the definition of spherical harmonics) before using them in SALTED. In newer versions of AIMS, this is done before outputing; in older versions this is also done by salted.aims.move_data
, based on additional output files idx_prodbas.out
and prodbas_condon_shotley_list.out
. The sequence after reordering is described in idx_prodbas_details.out
(see table above for column names), and the reordering follows (m_num, l_num, atom_idx)
(increasing sorting importance) in ascending order.
file name | description | FHI-aims tag |
---|---|---|
idx_prodbas.out |
reordering index for spherical harmonics (see below) | ri_full_output .True. |
idx_prodbas_details.out |
basis info, columns = (reordering_idx, atom_idx, l_num, bas_idx, m_num) , not used later |
ri_full_output .True. |
prodbas_condon_shotley_list.out |
Condon Shotley \((-1)^{m}\) phase factor index | ri_full_output .True. |
Note that the script will automatically determine the version of AIMS used; the same SALTED command should be used regardless of the version of AIMS.
Also, the product basis information is really important for SALTED (for generating \(\lambda\)-SOAP kernels), and we should transfer such information from basis_info.out
to the SALTED basis database basis_data.yaml
, which is stored with your local installation of SALTED.
This is achieved by running
python -m salted.get_basis_info
Benchmark dataset
Because the RI/density fitting procedure is not exact, we wish to check the accuracy of the fitted density. This is achieved by running
python -m salted.aims.get_df_err
Real-space densities read from rho_df.out
(DF) and rho_scf.out
(SCF density matrix) are compared as an average error over a dense real-space grid,
and the mean absolute error (in percent) is written to df_maes
at working root dir.