Water Dimer Calculation
Overview
This tutorial explains the workflow for Quantum Monte Carlo (QMC) calculations using TurboWorkflows, with Water Dimer as an example.
This tutorial is based on the sample presented in J. Chem. Phys. 159, 224801 (2023).
The files used in this tutorial can be found in file.tar.gz.
This tutorial performs QMC calculations for Water Dimer through the following steps:
PySCF Calculation: Generate initial wavefunction using DFT calculation
TREXIO Conversion: Convert PySCF results from TREXIO format to TurboRVB format
VMC Optimization: Optimize Jastrow factors
VMC Calculation: Perform VMC calculation with optimized wavefunction
LRDMC Calculation: Perform LRDMC calculations with multiple a values and extrapolate to a→0
Step 1: PySCF Calculation
As the first step, we perform a DFT calculation using PySCF to generate the initial wavefunction.
Input Files
water_dimer.xyz: Molecular structure file (XYZ format)
6
O -1.551007 -0.114520 0.000000
H -1.934259 0.762503 0.000000
H -0.599677 0.040712 0.000000
O 1.350625 0.111469 0.000000
H 1.680398 -0.373741 -0.758561
H 1.680398 -0.373741 0.758561
PySCF Calculation Script
pyscf_water_dimer.py performs DFT calculation with the following settings:
Basis set:
ccecp-ccpvtzECP:
ccecpMethod: DFT (LDA)
Output: PySCF checkpoint file (
pyscf.chk)
Execution Method
Execute the PySCF calculation using the do.sh script:
bash do.sh
This script performs the following operations:
Execute
pyscf_water_dimer.pyto perform PySCF calculationConvert PySCF checkpoint file (
pyscf.chk) to TREXIO format (water.hdf5)
#!/bin/sh
# Execute PySCF calculation
python3 pyscf_water_dimer.py 2>&1 | tee out.o
# Convert to TREXIO format
trexio convert-from -t pyscf -i pyscf.chk -b hdf5 water.hdf5
Output Files
pyscf.chk: PySCF checkpoint filewater.hdf5: Wavefunction data in TREXIO format (used in the next step)out.pyscf,out.o: PySCF calculation log output and error output
Note
This step will be simplified in the upcoming version of PySCF that directly generates outputs in TREXIO format.
Step 2: TurboRVB Workflow Execution
Use the task.py script to execute a series of workflows from conversion to TurboRVB format to the final LRDMC calculation.
Workflow Class Overview
task.py uses the following four workflow classes:
TREXIO_convert_to_turboWF: A workflow that converts TREXIO format files to TurboRVB format wavefunction (
fort.10). It converts results from electronic structure calculations such as PySCF into a format usable by TurboRVB and also initializes Jastrow factors.VMCopt_workflow: A workflow that performs VMC (Variational Monte Carlo) optimization. It optimizes wavefunction parameters (especially Jastrow factors) to generate a more accurate wavefunction. Through automatic continuation, it continues optimization until the target error bar is achieved.
VMC_workflow: A workflow that performs VMC calculations. It uses optimized wavefunctions to calculate physical quantities such as energy expectation values. Through automatic continuation, it continues calculation until the target error bar is achieved.
LRDMC_ext_workflow: A workflow that performs LRDMC (Lattice Regularized Diffusion Monte Carlo) calculations with multiple a values and extrapolates to a→0. It calculates energies at each a value and finally obtains the energy at the a→0 limit through extrapolation.
For detailed parameters of each workflow class, please refer to the reference manual.
Encapsulated_Workflow Class
In TurboWorkflows, each calculation workflow is wrapped by the Encapsulated_Workflow class. This class manages workflow execution in isolated directories and automatically handles input file copying and output file tracking.
Main parameters of Encapsulated_Workflow:
label: Workflow identifier. Used when referencing from other workflows.dirname: Directory name for workflow execution. Each workflow runs in an independent directory.input_files: List of input files. Can specify file paths (strings) orVariableobjects.workflow: Workflow instance that performs the actual calculation (e.g.,TREXIO_convert_to_turboWF,VMCopt_workflow).
In input_files, you can specify normal file paths (strings) or use Variable objects to reference outputs from other workflows.
Describing Dependencies with Variable Class
The Variable class is used to describe dependencies between workflows. By using Variable, you can automatically pass outputs from one workflow as inputs to another workflow.
Parameters of Variable:
label: Specifies thelabelof the source workflow.vtype: Specifies the variable type. One of the following:"file": Reference a single file"file-list": Reference a list of multiple files"value": Reference a calculated value (numeric, etc.)
name: Specifies the filename or value label to reference.
Locations where Variable can be used:
input_filesparameter: When specifying output files from other workflows as inputsinput_files=[ Variable(label="trexio-workflow", vtype="file", name="fort.10"), Variable(label="trexio-workflow", vtype="file", name="pseudo.dat"), ]
Within
workflowparameter: When using output values from other workflows as workflow parametersworkflow=LRDMC_ext_workflow( lrdmc_trial_etry=Variable(label="vmc-workflow", vtype="value", name="energy"), ... )
The Launcher class automatically resolves Variable objects, retrieves appropriate file paths or values, and passes them to workflows. This automatically manages dependencies between workflows and executes them in the appropriate order.
Workflow Configuration
task.py executes the following four workflows sequentially:
1. Conversion from TREXIO to TurboRVB Format
# Convert from a TREXIO file to a WF with TurboRVB format
trexio_workflow = Encapsulated_Workflow(
label="trexio-workflow",
dirname="trexio-workflow",
input_files=[
os.path.join(root_dir, "water.hdf5")
],
workflow=TREXIO_convert_to_turboWF(
trexio_filename="water.hdf5",
jastrow_basis_dict=jastrow_basis_dict,
),
)
This workflow:
Takes
water.hdf5as inputDefines Jastrow basis functions (for H and O atoms)
Generates TurboRVB format wavefunction file (
fort.10)
In this example, a normal file path (os.path.join(root_dir, "water.hdf5")) is specified in input_files. This is an example of using external files (PySCF calculation results) as input. On the other hand, subsequent workflows use Variable to reference outputs from previous workflows.
2. VMC Optimization
# VMC optimization of Jastrow factor
# One-, two-, and three-body Jastrow are optimized
vmcopt_workflow = Encapsulated_Workflow(
label="vmcopt-workflow",
dirname="vmcopt-workflow",
input_files=[
Variable(label="trexio-workflow", vtype="file", name="fort.10"),
Variable(label="trexio-workflow", vtype="file", name="pseudo.dat"),
],
workflow=VMCopt_workflow(
# cluster information
server_machine_name="localhost",
queue_label="default",
sleep_time=60,
mpi=True,
# vmc optimization parameters
vmcopt_max_continuation=2,
vmcopt_num_walkers=128,
vmcopt_target_error_bar=7.5e-3,
vmcopt_trial_optsteps=10,
vmcopt_trial_steps=50,
vmcopt_production_optsteps=40,
vmcopt_optwarmupsteps_ratio=0.8,
vmcopt_bin_block=1,
vmcopt_warmupblocks=0,
vmcopt_optimizer="lr",
vmcopt_learning_rate=0.35,
vmcopt_regularization=0.001,
vmcopt_onebody=True,
vmcopt_twobody=True,
vmcopt_det_mat=False,
vmcopt_jas_mat=True,
vmcopt_det_basis_exp=False,
vmcopt_jas_basis_exp=False,
vmcopt_det_basis_coeff=False,
vmcopt_jas_basis_coeff=False,
vmcopt_maxtime=3600,
),
)
This workflow:
Uses
fort.10andpseudo.datgenerated in the previous step as inputOptimizes one-body, two-body, and three-body Jastrow factors
Outputs optimized wavefunction
Here, Variable(label="trexio-workflow", vtype="file", name="fort.10") is used to reference the fort.10 file generated by trexio-workflow. The Launcher automatically retrieves the fort.10 file from the output directory after trexio-workflow execution completes and copies it as input for this workflow.
Job execution parameters:
server_machine_name="localhost": Name of the server machine where jobs will run. Specify a machine name defined inmachine_data.yaml. In this example,"localhost"(execution on local machine) is specified.queue_label="default": Queue label for the job queue system. References queue settings defined inqueue_data.toml.sleep_time=60: Wait time (seconds) between job status checks. While waiting for job completion, status is checked at this interval.mpi=True: Whether to use MPI version of the binary. IfTrue, parallel computation is possible.
Calculation parameters:
vmcopt_num_walkers=128: Number of walkersvmcopt_target_error_bar=7.5e-3: Target error barvmcopt_optimizer="lr": Optimization method (linear method with conjugate gradient)(See the reference manual for other parameters.)
3. VMC Calculation
# VMC calculation with the optimized WF
vmc_workflow = Encapsulated_Workflow(
label="vmc-workflow",
dirname="vmc-workflow",
input_files=[
Variable(label="vmcopt-workflow", vtype="file", name="fort.10"),
Variable(label="vmcopt-workflow", vtype="file", name="pseudo.dat"),
],
workflow=VMC_workflow(
# cluster information
server_machine_name="localhost",
queue_label="default",
sleep_time=60,
mpi=True,
# vmc parameters
vmc_max_continuation=2,
vmc_num_walkers=128,
vmc_target_error_bar=5.0e-3,
vmc_trial_steps=150,
vmc_bin_block=10,
vmc_warmupblocks=5,
vmc_maxtime=3600,
),
)
This workflow:
Performs VMC calculation using optimized wavefunction
Calculates energy expectation value
Results are used as initial energy for the next LRDMC calculation
This workflow uses the wavefunction (fort.10) optimized by vmcopt-workflow. The energy value obtained as a result of VMC calculation is saved in output_values with the key "energy" and is referenced in the next LRDMC calculation as Variable(label="vmc-workflow", vtype="value", name="energy").
In this example, job execution parameters such as server_machine_name="localhost", queue_label="default", sleep_time=60, and mpi=True are set. These parameters should be set appropriately according to the computational environment used, similar to VMC optimization.
4. LRDMC Calculation and a→0 Extrapolation
# LRDMC calculations with the optimized WF
# LRDMC energies are computed with a = 0.20, 0.30, 0.40 and 0.50,
# and then, extrapolated to a->0 limit.
lrdmc_ext_workflow = Encapsulated_Workflow(
label=f"lrdmc-ext-workflow",
dirname=f"lrdmc-ext-workflow",
input_files=[
Variable(label="vmcopt-workflow", vtype="file", name="fort.10"),
Variable(label="vmcopt-workflow", vtype="file", name="pseudo.dat"),
],
workflow=LRDMC_ext_workflow(
# cluster information
server_machine_name="localhost",
queue_label="default",
sleep_time=60,
mpi=True,
# lrdmc parameters
lrdmc_max_continuation=2,
lrdmc_num_walkers=128,
lrdmc_target_error_bar=5.0e-3,
lrdmc_trial_steps=150,
lrdmc_bin_block=10,
lrdmc_warmupblocks=5,
lrdmc_correcting_factor=10,
lrdmc_trial_etry=Variable(label="vmc-workflow", vtype="value", name="energy"),
lrdmc_alat_list=[-0.20, -0.30, -0.40, -0.50],
lrdmc_nonlocalmoves="tmove",
lrdmc_maxtime=3600,
),
)
This workflow:
Performs LRDMC calculations with multiple a values (0.20, 0.30, 0.40, 0.50)
Uses VMC calculation energy as initial value
Calculates energy at each a value and performs extrapolation to a→0
In this example, not only are file dependencies specified in input_files, but also Variable(label="vmc-workflow", vtype="value", name="energy") is specified in lrdmc_trial_etry within the workflow parameter to use the energy value obtained from VMC calculation as the initial value for LRDMC calculation. This allows not only files but also calculated values to be passed between workflows.
Job execution parameters (server_machine_name, queue_label, sleep_time, mpi, etc.) are set similarly to VMC optimization and VMC calculation.
Workflow Execution
Register all workflows with the Launcher class and execute:
# add the workflows to the Launcher class
cworkflows_list = [
trexio_workflow,
vmcopt_workflow,
vmc_workflow,
lrdmc_ext_workflow,
]
launcher = Launcher(cworkflows_list=cworkflows_list, dependency_graph_draw=True)
# Launch the jobs
launcher.launch()
Execution Method
python task.py
Workflow Dependencies
Workflows are executed with the following dependencies:
water.hdf5
↓
trexio-workflow (TREXIO → TurboRVB conversion)
↓
vmcopt-workflow (VMC optimization)
↓
vmc-workflow (VMC calculation) ──┐
↓ │
lrdmc-ext-workflow (LRDMC calculation and extrapolation)
The Launcher automatically resolves dependencies and executes workflows in the appropriate order.
Output Directory Structure
After execution, results are saved in the results/ directory with the following structure:
results/
├── trexio-workflow/
│ ├── fort.10 # TurboRVB format wavefunction
│ └── pseudo.dat # Pseudopotential file
├── vmcopt-workflow/
│ ├── fort.10 # Optimized wavefunction
│ └── ...
├── vmc-workflow/
│ └── ... # VMC calculation results
└── lrdmc-ext-workflow/
└── ... # LRDMC calculation results and extrapolation results
Parameter Adjustment
Job Execution Parameters
For calculation workflows such as VMC, VMC optimization, and LRDMC, the following job execution parameters need to be set:
server_machine_name: Name of the server machine where jobs will run. Specify a machine name defined inmachine_data.yaml. Default is"localhost"(execution on local machine). In thetask.pyexample, this parameter is explicitly set to"localhost". When executing on a remote server, this parameter needs to be set to an appropriate machine name.queue_label: Queue label for the job queue system (PBS, SLURM, etc.). References queue settings defined inqueue_data.toml. In thetask.pyexample,"default"is used.mpi: Whether to use MPI version of the binary. IfTrue, parallel computation becomes possible. Default isFalse.version: Version of TurboRVB to use. Default is"stable".sleep_time: Wait time (seconds) between job status checks. While waiting for job completion, status is checked at this interval. Default is1800(30 minutes).
These parameters need to be set appropriately according to the computational environment used. When executing on a remote server, set server_machine_name to an appropriate machine name and define the machine settings (IP address, SSH connection information, queue system settings, etc.) in machine_data.yaml.
machine_data.yaml is a configuration file that defines settings for each server machine. This file contains information such as machine type (local/remote), IP address, whether to use queue system, job management commands (jobsubmit, jobcheck, jobdel, etc.). Similarly, queue_data.toml defines settings for each queue (number of CPUs, memory, execution time limit, etc.).
Jastrow Basis Functions
Jastrow basis functions can be defined in jastrow_basis_dict in task.py:
# dictionary of Jastrow basis (GAMESS format)
jastrow_basis_dict = {
"H": """
S 1
1 1.873529 1.00000000
S 1
1 0.802465 1.00000000
S 1
1 0.147217 1.00000000
""",
"O": """
S 1
1 1.686633 1.00000000
S 1
1 0.237997 1.00000000
S 1
1 0.125346 1.00000000
P 1
1 1.331816 1.00000000
""",
}
VMC Optimization Parameters
By adjusting parameters of VMCopt_workflow, you can control optimization accuracy and computation time:
vmcopt_num_walkers: Number of walkers (affects calculation accuracy)vmcopt_target_error_bar: Target error bar (smaller is more accurate)vmcopt_learning_rate: Learning rate (affects optimization convergence speed)
LRDMC Parameters
Parameters of LRDMC_ext_workflow:
lrdmc_alat_list: List of a values to calculatelrdmc_target_error_bar: Target error barlrdmc_nonlocalmoves: Non-local move method (“tmove”, etc.)
Troubleshooting
When PySCF Calculation Fails
Check the format of
water_dimer.xyzVerify that PySCF and required packages are correctly installed
Check that computational resources (memory, CPU) are sufficient
When TREXIO Conversion Fails
Check that
water.hdf5is correctly generatedVerify that TREXIO is correctly installed
When Workflow Fails
Check log files for each workflow
Verify that dependencies are correctly set
Check computational resources (especially cluster settings)
Summary
In this tutorial, we performed QMC calculations for Water Dimer through the following steps:
Generate initial wavefunction with PySCF calculation (
do.sh)Execute a series of QMC calculations with TurboWorkflows (
task.py)Conversion from TREXIO to TurboRVB format
VMC optimization
VMC calculation
LRDMC calculation and a→0 extrapolation
Results from each step are saved in the results/ directory and used as input for the next step. The Launcher class automatically manages dependencies and executes workflows in the appropriate order.