98 Wavefuntion optimization

00 What the VMC optimization does?

From this document, you can learn how to optimize a wavefunction at the VMC level in practice. The most difficult operation in quantum Monte Carlo simulations is to optimize a many-body wavefunction at the VMC level. Here, we describe how to do it in practice. You can also refer to the optimization part (VII. Optimization of WFs) of the review article.

In Variational Monte Carlo, we evaluate the following integral using the Markov-chain monte Carlo method.

E = \cfrac{\int d \vec{R} \cdot \Psi^{*} \left( \vec{R} \right) \cdot \hat{\cal{H}} \Psi \left( \vec{R} \right)}{\int d \vec{R} \cdot \Psi^{*} \left( \vec{R} \right) \Psi \left( \vec{R} \right)}

From the variational principle, the closer to the exact the WF is, the lower the energy becomes. There once we can evaluate the above integral with some variational parameters \vec{\alpha}, we can seek exact solution by minimizing the parameters.

E\left(\vec{\alpha}\right) = \cfrac{\int d \vec{R} \cdot \Psi^{*} \left( \vec{R}, \vec{\alpha} \right) \cdot \hat{\cal{H}} \Psi \left( \vec{R}, \vec{\alpha} \right)}{\int d \vec{R} \cdot \Psi^{*} \left( \vec{R}, \vec{\alpha} \right) \Psi \left( \vec{R}, \vec{\alpha} \right)} \ge E_0

01 Optimization method

There are two major optimization methods are implemented in TurboRVB. One is the so-called stochastic reconfiguration (SR) method (itestr4 = -9,-5), and the other one is the so-called linear method (LR) with conjugate gradient (itestr4 = -4,-8). Notice that -9 and -4 do not optimize exponents of the determinant part, while -5 and -8 optimize them.

In general, the LR method works very efficiently and properly when the number of variational parmeters is not so large. For example, less than 1000? If you optimize more than 1000 variational parameters, we recommend the SR method.

02 important hyperparameters

The most important parameters in practice are

  • tpar: Acceleration parameter (i.e., learning rate.)

  • parr: Regularization (c.f., LASSO)

tpar

For example, in the stochastic reconfiguration (SR) method, the variational parameters are updated as:

{\alpha _k} \to {\alpha _k} + \Delta \cdot {\left( {{{{\boldsymbol{\mathcal{S'}}}}^{ - 1}}{\mathbf{f}}} \right)_k},

where \Delta is tpar, {\mathbf{f}} is the general force vector, and {\boldsymbol{\mathcal{S'}}} is the variance-covariance matrix which is stochastically evaluated by means of M configuration samples {\mathbf{x}} = \left\{ {{{\mathbf{x}}_1},{{\mathbf{x}}_2}, \ldots {{\mathbf{x}}_M}} \right\}:

{{\mathcal{S}}_{k,k'}} = \left[ {\frac{1}{M}\sum\limits_{i = 1}^M {\left( {{O_k}\left( {{{\mathbf{x}}_i}} \right) - {{\bar O}_k}} \right) ^ * \left( {{O_{k'}}\left( {{{\mathbf{x}}_i}} \right) - {{\bar O}_{k'}}} \right)} } \right],

where {O_k}\left( {{{\mathbf{x}}_i}} \right) = \frac{{\partial \ln \Psi \left( {{{\mathbf{x}}_i}} \right)}}{{\partial {\alpha _k}}} and {{\bar O}_k} = \frac{1}{M}\sum\limits_{i = 1}^M {{O_k}\left( {{{\mathbf{x}}_i}} \right)}.

parr

The straightforward implementation of the SR method is not stable mainly because the statistical noise sometimes makes the matrix {\boldsymbol{\mathcal{S}}} ill-conditioned, which deteriorates the efficiency of the optimization method. Therefore, in practice, the diagonal elements of the preconditioning matrix {\mathcal{S}} are shifted by a small positive parameter (\varepsilon) as:

{s'_{i,i}} = {s_{i.i}}(1 + \varepsilon ),

where \varepsilon = parr in TurboRVB.

This modification improves the efficiency of the optimization by several orders of magnitude.

03 Order of optimizing variational parameters

The so-called Andrea-Zen’s method empirically works very well for the LR method. Indeed, to avoid local minima in the Jastrow, the developers have experienced that it is important at the beginning to optimize only the one-body Jastrow part. According to this procedure, one should optimize variational parameters in the following order.

  1. Put a reasonable two-body Jastrow parameter (typically ~ 1.0 for -6, -15, -22),

  2. Optimize homogenous and inhomogeneous one-body Jastrows (two-body and three-body are fixed),

  3. Optimize three-body Jastrows (two-body fixed), and

  4. Optimize two-body Jastrow(s) and the determinant part,

wherein

  1. One should set iesfree=1 and iesd=1, and put twobodyoff=.true. and iesdtwobodyoff=.true. in the &parameters section,

  2. One should remove iesdtwobodyoff=.true. option, and

  3. One should remove twobodyoff=.true. and iesdtwobodyoff=.true., and set iessw=1 (determinant optimization).

04 Criterium of optimization convergence

In practice, during an optimization, the code monitors the variational energy (E\left( \boldsymbol{\alpha} \right)) and the maximum value of the signal to noise ratio among all the force components, which is denoted as devmax in the code:

devmax \equiv \max_k \left( {\left| {\frac{{{f_k}}}{{{\sigma _{{f_k}}}}}} \right|} \right)

where {\sigma _{{f_k}}} represents the estimated error bar of a general force {f_k} =- \frac{{\partial E\left( \alpha \right)}}{{\partial {\alpha _k}}} = - \frac{\partial }{{\partial {\alpha _k}}}\frac{{\braket{{\Psi _\alpha }|\hat {\mathcal{H}}|{\Psi _\alpha }}}}{{\braket{{\Psi _\alpha }|{\Psi _\alpha }}}}.

You can plot energies by plot_Energy.sh

../../../_images/energy.png

You can also plot devmaxs by plot_devmax.sh

../../../_images/devmax.png

The devepoler has experience that, at least, devmax should be smaller than 4.0 after optimization. However, the developers also have experienced that this simple criteria is not sufficient to obtain a converged result.

Instead, the developers recently checked if the two-body Jastrow and the inhomogeneous one-body Jastrows are also converged. This seems an empirically good criterium of optimization convergence, though it is still under debate.

05 Hyperparameters in the optimization methods

There are several hyperparameters in the optimization method. Although a proper choice for some hyperparameters are still under debate, we show a tentative guidline.

05-01 nweight

nweight is the number of Monte Carlo sampling per optimization step. For the LR method, the number of samplings in VMC should be much larger than the number of variational parameters, i.e., nweight \times Number of (mpi) tasks > 5 \sim 10 \times p, where p is the number of variational parameters.

For the SR method, nweight \times Number of (mpi) tasks can be set even or smaller than p as long as parr is set sufficiently large.

05-02 tpar

tpar is an acceleration hyperparameter in optimization, corresponding to Delta in Eq.131 and that in Eq.139 of the review paper for the LR and SR methods, respectively. In the machine learning community, tpar is also called learning rate. For the LR method, tpar = 0.35 usually works well. For the SR method, one should set tpar much smaller, typically 1.0d-4.

adjust_tpar is a useful option recently introduced by Andrea Tirelli, to find an optimal tpar. Indeed, if adjust_tpar is set true, tpar gradually increases as optimization iteration goes on after 100 equilibrium iterations.

beta_learning is also a useful option to realize a stable optimization. beta_learning = 0.90 is a good starting point?

05-03 parr

parr is a regularization parameter which is added to the diagonal elements of a preconditioning matrix S, in Eq.128 of the review paper. In the LR method, XXX

KN is now working…

05-04 ncg

This works only for the LR method.

KN is now working…

05-05 npbra and parcurpar

This works only for the LR method.

KN is now working…