Predict new structure and collect outputs

Overview before starting

We are going to

Predict electron density by SALTED on water dimers (having trained only on monomers).
Calculate derived properties from the predictions by FHI-aims.
Compare the results with the reference values from FHI-aims.

Related starting files

file or dir name	description
`README.rst`	README file for your reference
`inp.yaml`	SALTED input file, consists of file paths and hyperparameters
`control_read_setup.in`	FHI-aims control file, for preparing basis info for reordering data
`control_read.in`	FHI-aims control file, for predicting properties by SALTED outputs
`run-aims-predict.sbatch`	Example sbatch script, for predicting properties by SALTED outputs
`water_dimers_10.xyz`	`xyz` file, predicting dataset

Predict densities of new structures

We will use water_dimers_10.xyz as prediction dataset.

Before running the prediction, check the inp.yaml file and make sure the inp.prediction.filename entry is water_dimers_10.xyz. If you want to predict other structures, you can prepare your own xyz file and change the inp.prediction.filename to your own xyz file name. Never forget this step, or we will be predicting on wrong structures.

To conduct the prediction, run

mpirun -np $ntasks python -m salted.prediction

and predicted DF coefficients are stored at predictions_[inp.salted.saltedname/M[inp.gpr.Menv]_zeta[inp.gpr.z]/N[ntrain]_reg[inp.gpr.regul]/COEFFS-[n].dat with [n] for the $n$-th configuration in water_dimers_10.xyz. Note that the COEFFS files are 1-indexed, unlike the numpy files produced during Part 2 which are 0-indexed. ntrain is given by inp.gpr.Ntrain * inp.gpr.trainfrac.

Calculate derived properties of new structures

The example sbatch script run-aims-predict.sbatch is provided for reference and will execute the commands described in this section; you should adapt it to your own computer.

Predict electron density by SALTED

It is recommended to use AIMS version >= 240403 when performing predictions based on SALTED coefficients, as this significantly simplifies the workflow.

To set up the AIMS calculations, you should first run salted.aims.make_geoms --predict; the --predict flag here creates an AIMS geometry file for each structure in the prediction set in the folder inp.predict_data/geoms. You should then create directories for each prediction calculation, and copy control_read.in to control.in for each directory along with the corresponding geometry.in file. Finally, the the coefficients need to be added to each working directory for the AIMS calculations. This is achieved by running salted.aims.move_data_in. This will produce a file ri_restart_coeffs_predicted.out in each directory, which will need to be renamed to ri_restart_coeffs.out prior to running AIMS.

Instructions when using AIMS version < 240403

Before reading in the predicted coefficients in FHI-aims, we need to reorder the DF coefficients from SALTED to conform with the spherical-harmonic convention in FHI-aims. To get the necessary files, we will prerun FHI-aims with control_read_setup.in and geometry.in in the working dir. salted.aims.move_data_in_reorder performs the reordering based on idx_prodbas.out and prodbas_condon_shotley_list.out from the FHI-aims prerun, and the output file names are ri_restart_coeffs_predicted.out.

The example sbatch script run-aims-predict-reorder.sbatch is provided only for reference, and you should adapt to your own computer. Have a look at the file control_read_setup.in to understand the RI-related flags needed in this case.

sbatch run-aims-predict.sbatch

For each FHI-aims calculation, the script should

Copy control_read_setup.in to control.in and geometry.in to the working dir.
Run FHI-aims prerun across all structures.
- During prerun, FHI-aims will just generate idx_prodbas.out and prodbas_condon_shotley_list.out for reordering and rephasing DF coefficients.
- You might see FHI-aims outputs ending like this:
```
An error led to a call to aims_stop_coll, but without a specific message. A detailed    message may be in another file
```
  and this can be ignored, as the code just stops after generating the necessary lists, without doing anything else.
Run python -m salted.move_data_in.
Rename ri_restart_coeffs_predicted.out to ri_restart_coeffs.out.
Run FHI-aims a prediction calculation across all structures.
- There will only be one SCF iteration (one diagonalization).
Rename rho_rebuilt_ri.out to rho_ml.out, rename ri_restart_coeffs.out to ri_restart_coeffs_ml.out
- We have explained the reason in Part 1.

rho_ml.outs and ri_restart_coeffs_ml.outs contain the predicted densities and DF coefficients.

Validate the predicted densities

To further check the prediction results, we can compare the predicted densities with the reference values from FHI-aims. We will reuse the script run-aims.sbatch to calculate these references, but do remember to change bash variable $DATADIR to [inp.qm.path2qm]+[inp.prediction.predict_data]. You can also comment the line ri_full_output .True. in control.in to avoid outputting the overlap matrix for the predicted structures.

sbatch run-aims.sbatch

The output files rho_scf.out contain the reference ab initio densities for each structure.

Then run

python -m salted.aims.get_ml_err

to compare rho_ml.out with rho_scf.out. Real-space integral of the absolute error is stored in ml_maes for each structure, and the total mean absolute error is printed to terminal.

Get physical properties

To get physical properties (like electrostatic energy, XC energy and total energies per atom) from AIMS output, run

python -m salted.aims.collect_energies

The properties are written to files predict_reference_*, with the predicted properties in the first column and reference values in the second column.