Intro

We keep getting weird issues with Psi4, which are only magnified by bigger molecules. Here, I examine how psi4 is utilizing memory to see if there are any glaring issues.

These experiments are done using an optimization that consistently fails using normal procedures. The optimization in QCArchive is 34754174. The “molecule” in question is:

Procedures/Scripts

Using psi4 1.4a3.dev63+afa0c21

I run the following in two terminals, one is the main psi4 running:

bash run-local.sh 34754174

and the other processes the output and generates a plot:

bash makeplot.sh 34754174

this will generate everything in a folder named 34754174. For the impatient, my plotting terminal actually runs the disk checker (see below), and the plotter periodically:

while sleep 60 ; do bash makeplot.sh 34754174 ; done &
while sleep 1; do
    if [ ! -e 34754174/*_psi_scratch ]; then
        continue;
    fi
    echo -n "$(date +%s ) "; du -s 34754174/*_psi_scratch | awk '{print $1}';
done >> 34754174/disk.dat &

In effect, running the first and last box in separate terminals should get you the plots I show below. The plot is 34754174/mem.png.

Running

Using this modified script from David Dotson to run a task (name it run-local.py):

#!/usr/bin/env python

import sys
import os
import pprint
from qcfractal.interface import FractalClient
from qcengine import compute_procedure

id = sys.argv[1]

# os.makedirs(id, exist_ok=True)

cl = FractalClient.from_file()
task = cl.query_tasks(base_result=id)[0]
print(task)

with open('task.json', 'w') as f:
    f.write(pprint.pformat(task.json()))

task.spec.args[0]['input_specification']['keywords']['scf_mem_safety_factor'] = 0.5
# task.spec.args[0]['input_specification']['keywords']['df_ints_num_threads'] = 1
# task.spec.args[0]['input_specification']['keywords']['diis'] = True
# task.spec.args[0]['input_specification']['model']['method'] = 'mp2'

result = compute_procedure(*task.spec.args, local_options={"memory": "16.0", "ncores": 8, "scratch_directory": os.getcwd() })
print("Computation done.")
print(result)

with open('result.json', 'w') as f:
    f.write(pprint.pformat(result.json()))

Noting that I added local_options={"memory": "16.0", "ncores": 8, "scratch_directory": os.getcwd() } to be able to control the cpu/mem given to psi4 and the scratch directory since my local machine uses a tmpfs as /tmp. There is also some record of how/what settings I was testing.

I drive this python script with the following bash script, which does the initialization and memory recording (name this run-local.sh):

#!/bin/bash
id=$1
d=$PWD

finish() {
	cd $d
	kill $(jobs -p)
}
export -f finish

mkdir -p $id
cd $id

trap finish SIGINT 
trap finish SIGCHLD 
python3 -u ../run-local.py $id &
pidstat -p ALL -l -C "$id" -r -h -H -u 1  | grep python | tee stat.log

Ideally the traps are there to stop pidstat once the background psi4 job finishes, but it’s a bit finicky (and I haven’t had too many jobs run until completion ). In any case, the pidstat command will create a memory log of the main process (e.g. the qcengine/geomeTRIC process) and the psi4 subprocess calls.

Plotting

I then have a similar setup for driving the plots. Here is the bash driver (name this makeplot.sh):

#!/bin/bash
id=$1
d=$PWD
cd $1

awk '/psi4/ {print $1 " " $13}' stat.log  | head -n -1 > psi4.dat
awk '/run-local/ {print $1 " " $13}' stat.log  | head -n -1 > main.dat
awk '/psi4/ {print $1 " " $22}' stat.log  | tr -d "[A-Z]" | head -n -1 > memlimit.dat

if [ -f out.log ] ; then
        awk '/Cache is/ {print $1 " " $4}' out.log  > cache.dat
fi

python3 $d/plot.py

cd $d

I do the head -n -1 funny business in case buffering produced an incomplete last line, which would make the numpy loader bork.

The python plotter (name this plot.py):

import matplotlib.pyplot as plt
import numpy as np
import os

fig = plt.figure(figsize=(12*4, 5), dpi=200)
ax = fig.add_subplot(111)

x, y = np.loadtxt("main.dat").T
xmin = x.min()

ax.plot(x-xmin, y / 1024 ** 2, "b", lw=0.3, label="geometric")

x, y = np.loadtxt("psi4.dat").T
ax.plot(x-xmin, y / 1024 ** 2, "r", lw=0.3, label="psi4")

x, y = np.loadtxt("memlimit.dat").T
ax.plot(x-xmin, y, "k", label="mem specified")

if os.path.exists("cache.dat"):
    x, y = np.loadtxt("cache.dat").T
    try:
        size_x = len(x)
        if size_x > 0:
            ax.plot(x-xmin, y / 1024 ** 3, "g", lw=0.3, label="B cache")
    except Exception:
        pass
if os.path.exists("disk.dat"):
    x, y = np.loadtxt("disk.dat").T
    try:
        size_x = len(x)
        if size_x > 0:
            ax.plot(x - xmin, y / 1024 ** 2, "g", lw=0.3, label="Disk space")
    except Exception:
        pass
ax.set_xlabel("Time (seconds)")
ax.set_ylabel("Memory (GB)")
ax.legend(loc=3)
ax.tick_params(labeltop=False, labelright=True)
ax.grid(axis='y')
fig.tight_layout()
fig.savefig("mem.png")

As can be seen from the plotter, I decided to record disk usage of the psi4 scratch about half way through the experiments. To enable this, I ran this in the background:

while sleep 1; do
    if [ ! -e 34754174/*_psi_scratch ]; then
        continue;
    fi
    echo -n "$(date +%s ) "; du -s 34754174/*_psi_scratch | awk '{print $1}';
done >> 34754174/disk.dat &

Not pretty since I wrote it as a quick one-liner, but works.

I originally wanted to look into the B matrix cache of geomeTRIC, since it is unbounded and for some other experimental jobs I ran into the warning seen below. I used a locally modifed geomeTRIC, which prints out the B matrix cache size whenever it is accessed (https://github.com/leeping/geomeTRIC/blob/master/geometric/internal.py#L1761):

    def wilsonB(self, xyz):
        """
        Given Cartesian coordinates xyz, return the Wilson B-matrix
        given by dq_i/dx_j where x is flattened (i.e. x1, y1, z1, x2, y2, z2)
        """
        global CacheWarning
        t0 = time.time()
        xhash = hash(xyz.tostring())
        ht = time.time() - t0
        if xhash in self.stored_wilsonB:
            ans = self.stored_wilsonB[xhash]
            return ans
        WilsonB = []
        Der = self.derivatives(xyz)
        for i in range(Der.shape[0]):
            WilsonB.append(Der[i].flatten())
        self.stored_wilsonB[xhash] = np.array(WilsonB)
        ##################################################################
        mem = 0.0
        for stored,mat in self.stored_wilsonB.items():
            mem += mat.itemsize * mat.size
        logger.info("{:20.4f} Cache is {:f}\n".format(time.time(), mem))
        ##################################################################
        if len(self.stored_wilsonB) > 1000 and not CacheWarning:
            logger.warning("\x1b[91mWarning: more than 1000 B-matrices stored, memory leaks likely\x1b[0m\n")
            CacheWarning = True
        ans = np.array(WilsonB)
        return ans

However, since I am able to roughly account for this by recording the memory of the entire QCEngine/geomeTRIC process in these experiments, I avoid its use.

Experiments

SCF_MEM_SAFETY_FACTOR

Setting SCF_MEM_SAFETY_FACTOR is affected by number of cores. If I set 0.38 for 2 cores, the memory usage will be higher if I run the same thing with 8 cores.

This is the error if the safety factor is set too low (.20 when mem=2GB):

FailedOperation(error=ComputeError(error_type='unknown', error_message='geomeTRIC run_json error:\nTraceback (most recent call last):\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/run_json.py", line 214, in geometric_run_json\n    geometric.optimize.Optimize(coords, M, IC, engine, None, params)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1331, in Optimize\n    return optimizer.optimizeGeometry()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1293, in optimizeGeometry\n    self.calcEnergyForce()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1002, in calcEnergyForce\n    spcalc = self.engine.calc(self.X, self.dirname)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 874, in calc\n    return self.calc_new(coords, dirname)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 866, in calc_new\n    raise QCEngineAPIEngineError("QCEngineAPI computation did not execute correctly. Message: " + ret["error"]["error_message"])\ngeometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly. Message: QCEngine Unknown Error: Traceback (most recent call last):\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 411, in run_qcschema\n    ret_data = run_json_qcschema(input_model.dict(), clean, False, keep_wfn=keep_wfn)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 558, in run_json_qcschema\n    val, wfn = methods_dict_[json_data["driver"]](method, **kwargs)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/driver.py", line 717, in gradient\n    wfn = procedures[\'gradient\'][lowername](lowername, molecule=molecule, **kwargs)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2383, in run_scf_gradient\n    ref_wfn = run_scf(name, **kwargs)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2288, in run_scf\n    scf_wfn = scf_helper(name, post_scf=False, **kwargs)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 1568, in scf_helper\n    e_scf = scf_wfn.compute_energy()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 83, in scf_compute_energy\n    self.initialize()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 193, in scf_initialize\n    self.initialize_jk(self.memory_jk_, jk=jk)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 128, in initialize_jk\n    jk.initialize()\nRuntimeError: \nFatal Error: SCF::DF: Disk based algorithm requires 2 (A|B) fitting metrics and an (A|mn) chunk on core.\n         This is 2Q^2 + QNP doubles, where Q is the auxiliary basis size, N is the\n         primary basis size, and P is the maximum number of functions in a primary shell.\n         For this problem, that is 374074280 bytes before taxes,534391828 bytes after taxes. \n\nError occurred in file: /scratch/psilocaluser/conda-builds/psi4-multiout_1609773118680/work/psi4/src/psi4/libfock/DiskDFJK.cc on line: 727\nThe most recent 5 function calls were:\n\npsi::DiskDFJK::preiterations()\n\n\n'))

The formatted error is:

geomeTRIC run_json error:
Traceback (most recent call last):
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/run_json.py", line 214, in geometric_run_json
    geometric.optimize.Optimize(coords, M, IC, engine, None, params)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1331, in Optimize
    return optimizer.optimizeGeometry()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1293, in optimizeGeometry
    self.calcEnergyForce()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1002, in calcEnergyForce
    spcalc = self.engine.calc(self.X, self.dirname)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 874, in calc
    return self.calc_new(coords, dirname)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 866, in calc_new
    raise QCEngineAPIEngineError("QCEngineAPI computation did not execute correctly. Message: " + ret["error"]["error_message"])
geometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly. Message: QCEngine Unknown Error: Traceback (most recent call last):
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 411, in run_qcschema
    ret_data = run_json_qcschema(input_model.dict(), clean, False, keep_wfn=keep_wfn)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 558, in run_json_qcschema
    val, wfn = methods_dict_[json_data["driver"]](method, **kwargs)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/driver.py", line 717, in gradient
    wfn = procedures['gradient'][lowername](lowername, molecule=molecule, **kwargs)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2383, in run_scf_gradient
    ref_wfn = run_scf(name, **kwargs)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2288, in run_scf
    scf_wfn = scf_helper(name, post_scf=False, **kwargs)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 1568, in scf_helper
    e_scf = scf_wfn.compute_energy()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 83, in scf_compute_energy
    self.initialize()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 193, in scf_initialize
    self.initialize_jk(self.memory_jk_, jk=jk)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 128, in initialize_jk
    jk.initialize()
RuntimeError: 
Fatal Error: SCF::DF: Disk based algorithm requires 2 (A|B) fitting metrics and an (A|mn) chunk on core.
         This is 2Q^2 + QNP doubles, where Q is the auxiliary basis size, N is the
         primary basis size, and P is the maximum number of functions in a primary shell.
         For this problem, that is 374074280 bytes before taxes,534391828 bytes after taxes. 

Error occurred in file: /scratch/psilocaluser/conda-builds/psi4-multiout_1609773118680/work/psi4/src/psi4/libfock/DiskDFJK.cc on line: 727
The most recent 5 function calls were:

psi::DiskDFJK::preiterations()

0.25 works, but goes beyond the 2GB limit.

Here is an example using a safety factor of .20, and .75 (the default) with 4GB

DIIS

Now, I know DIIS saves state, so it might be the cause of the ramp. Turning off, we see exactly no change in memory behavior, other than the iterations are slower and more are needed:

DFT vs HF

Now, B3LYP/DZVP should be pretty stable; is the dispersion affecting this at all? Tried it, no. Just for fun; does using HF change anything? YES!

How about wb97x?

Yes, there is a ramp up (this one failed due to no disk space; currently working with a 18 GB space). However the memory requirements seem to be somewhat higher than b3lyp.

How about blyp? Yes

How about M06? Yes

How about TPSS? Yes

How about MP2? No (but the memory jumps way up after the HF finishes; ran out of disk space causing it to crash)

DF_INTS_NUM_THREADS

What about DF_INTS_NUM_THREADS since the documentation says it could affect memory issues? Setting to 1 does seem to affect memory (note that all calculations are done using 8 cores):

There seems to only be a constant offset; the difference in the internal iterations is about the same as previous experiments.

Direct vs DF

Tried using SCF_TYPE as DIRECT , which turns off density fitting, recommended by Ben Pritchard. Documentation is https://psicode.org/psi4manual/master/scf.html#eri-algorithms

It is slow, and I am not patient enough to let this finish on my desktop as I have other things in the queue:

We see a slight overall ramp which is consistent with the CORE algorithm shown below. Will try again later when things are idle.

Let a full DIRECT run finish

Full Runs

CORE vs DISK algorithm

At this point; I am not sure what other options could affect memory. There seems to be a consistent ramp in memory for DFT; not sure if this is normal or not.

Now, let’s try to take the optimization out many steps using typical settings of 8 cores, 30GB. This was using the CORE algorithm, meaning it could be done completely in memory. We have:

and it does fail after 76 iterations with:

FailedOperation(error=ComputeError(error_type='unknown', error_message='geomeTRIC run_json error:\nTraceback (most recent call last):\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/run_json.py", line 214, in geometric_run_json\n    geometric.optimize.Optimize(coords, M, IC, engine, None, params)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1331, in Optimize\n    return optimizer.optimizeGeometry()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1298, in optimizeGeometry\n    self.calcEnergyForce()\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1002, in calcEnergyForce\n    spcalc = self.engine.calc(self.X, self.dirname)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 874, in calc\n    return self.calc_new(coords, dirname)\n  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 866, in calc_new\n    raise QCEngineAPIEngineError("QCEngineAPI computation did not execute correctly. Message: " + ret["error"]["error_message"])\ngeometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly. Message: QCEngine Unknown Error: Unknown error, error message is not found\n'))

geomeTRIC run_json error:
Traceback (most recent call last):
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/run_json.py", line 214, in geometric_run_json
    geometric.optimize.Optimize(coords, M, IC, engine, None, params)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1331, in Optimize
    return optimizer.optimizeGeometry()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1298, in optimizeGeometry
    self.calcEnergyForce()
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/optimize.py", line 1002, in calcEnergyForce
    spcalc = self.engine.calc(self.X, self.dirname)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 874, in calc
    return self.calc_new(coords, dirname)
  File "/home/tgokey/.local/miniconda3/envs/psi4-test/lib/python3.7/site-packages/geometric/engine.py", line 866, in calc_new
    raise QCEngineAPIEngineError("QCEngineAPI computation did not execute correctly. Message: " + ret["error"]["error_message"])
geometric.errors.QCEngineAPIEngineError: QCEngineAPI computation did not execute correctly. Message: QCEngine Unknown Error: Unknown error, error message is not found

No memory errors. My machine has 31GB of memory and additional swap space, so if all is well there should have been enough memory to keep going even if virtual memory grabbed more than 31GB (I note that there is ~2GB more virtual than resident on average, so nothing crazy going on here). Let's try using the DISK algorithm by setting the safety factor lower to .5 and memory to 16 GB. Realistically, this should have zero memory issues; even virtual memory shouldn’t hit the 16 GB limit. Here is what I have so far:

Died at 94, RIP.

Also, oddly, the memory spikes at iteration 75; this spike may have been the cause for the other failure where the memory fluctuations are much higher. Also notice that the memory tends to decrease in this run, whereas it tended to increase in the CORE algorithm. Quite a difference in behavior.

Checking the output, it looks like everything is perfect, but the energy of the last psi4 run is None and there is no output from it. Everything seems to point at a psi4 calculation failing to start. Will try plugging in the last schema manually and seeing if psi4 accepts it. Just looking at the last structure shows nothing remarkable.

It does finish successfully using HF:

Trying to load the last qcschema molecule into the task and running it, with some print statements in the QCE Psi4 harness. Noting that execute is full of context managers and no exception handling. First two iterations went swimmingly, will see how long it lasts.

Plot the results of the complete optimization starting from the failed one above (iteration 94)

QCE/Psi4 notes