Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Identify optimization specimens reproducing “Unknown Errors” in production

  • Detail next steps for investigation, action to:

    • improve error reporting in QCEngine Psi4Harness

    • share problem cases with psi4 developers / quantum chemists to determine possible solutions

Discussion topics

Item

Presenter

Notes

Optimization failure specimen

Pavan

  • Optimization ID: 34752921

    • No psi4 output, fails on first iteration; geomeTRIC then chokes on no data to operate on

    • psi4 log from file is truncated; consistent with psi4 dying abruptly

    • Used --messy in Psi4Harness to preserve file outputs

    • Also put together script to run point calculation; should produce same result

    • we observe this one yielding SCF convergence error in at least one case in error cycling, uknown error in at least one case

  • Optimization ID: 34752766

    • consistently shows up as unknown error or timed out on error cycling

  • Optimization ID: 34754174

    • consistently shows up as unknown error in error cycling

Script for reproducing results

Pavan

Code Block
languagepy
from openforcefield.topology import Molecule
import qcengine
from qcelemental.models import AtomicInput, OptimizationInput
from qcelemental.models.common_models import Model
from qcelemental.models.procedures import QCInputSpecification
import time
qcel_mol = dict({'schema_name': 'qcschema_molecule', 'schema_version': 2, 'validated': True,
                 'symbols': ['O', 'O', 'O', 'O', 'O', 'C', 'C', 'C', 'C', 'N', 'N', 'N', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H'],
                 'geometry': [[-6.85549362e+00,  5.47263221e+00, -7.91620020e-01],
       [-6.09996888e+00,  3.73272044e+00,  5.36105059e+00],
       [-2.25309022e+00, -5.16413070e-01, -2.82277346e+00],
       [ 1.06456093e+00, -5.02849477e+00,  2.85972315e+00],
       [-1.02057886e+01,  3.19498085e+00,  6.38068449e+00],
       [-7.63636902e+00,  3.58961518e+00, -2.08465129e+00],
       [-7.99626705e+00,  2.34278525e+00,  5.17868413e+00],
       [-2.43434662e+00, -1.84480750e-01, -4.83915730e-01],
       [ 2.62393077e+00, -4.15006357e+00,  1.28072634e+00],
       [ 4.77578996e+00, -5.58614171e+00,  7.70056030e-01],
       [-6.66219637e+00,  1.17397580e+00, -1.46107784e+00],
       [-3.60192510e-01, -7.28054700e-01,  1.06375098e+00],
       [-9.47728705e+00,  4.04038190e+00, -4.11757290e+00],
       [-8.17540217e+00, -7.49641700e-02,  3.85301030e+00],
       [ 6.63575661e+00, -3.68688560e-01,  1.36631900e-01],
       [ 6.78411089e+00, -5.11592810e-01, -2.71665048e+00],
       [-5.85267366e+00, -9.63103850e-01,  2.56608701e+00],
       [ 3.99821435e+00,  2.29254750e-01,  9.28320550e-01],
       [ 9.48534052e+00, -1.11783055e+00, -3.44108157e+00],
       [-4.86248398e+00,  7.77298940e-01,  5.35042790e-01],
       [ 2.04476167e+00, -1.67525524e+00,  1.01865120e-01],
       [ 1.11301882e+01,  8.54402510e-01, -2.50661962e+00],
       [ 5.77336114e+00, -6.44943022e+00,  2.21597558e+00],
       [ 5.40542122e+00, -5.80792492e+00, -1.06924251e+00],
       [-7.28527743e+00, -3.66056620e-01, -2.50030800e+00],
       [-4.68861140e-01, -4.65824930e-01,  3.01559756e+00],
       [-1.08410818e+01,  2.48751058e+00, -4.39050708e+00],
       [-8.50930310e+00,  4.37890762e+00, -5.96685880e+00],
       [-1.05931624e+01,  5.74501939e+00, -3.62226351e+00],
       [-8.64203211e+00, -1.54344897e+00,  5.33175395e+00],
       [-9.87294500e+00, -1.17790500e-02,  2.62794374e+00],
       [ 7.80117252e+00,  1.34966238e+00,  6.51788560e-01],
       [ 7.46705658e+00, -2.02231415e+00,  1.03913464e+00],
       [ 6.19573927e+00,  1.36697353e+00, -3.47446245e+00],
       [ 5.52569530e+00, -1.97528834e+00, -3.49426273e+00],
       [-4.39623371e+00, -1.40984956e+00,  3.99569247e+00],
       [-6.30394383e+00, -2.78542805e+00,  1.61426803e+00],
       [ 3.80986311e+00,  4.27550830e-01,  3.00371152e+00],
       [ 3.47512355e+00,  2.13457946e+00,  1.47040350e-01],
       [ 9.53549828e+00, -1.16640291e+00, -5.55563998e+00],
       [ 9.98238317e+00, -2.95885003e+00, -2.62218022e+00],
       [-4.35350739e+00,  2.60267722e+00,  1.50390262e+00],
       [ 1.98932809e+00, -1.89377873e+00, -1.98558053e+00],
       [ 1.00910926e+01,  2.49700644e+00, -2.22755224e+00],
       [ 1.19005249e+01,  2.01640330e-01, -8.20297820e-01],
       [ 1.26429939e+01,  1.16188463e+00, -3.74007670e+00]],
        'name': 'C13H24N4O5',
        'identifiers': {'molecule_hash': 'fa1a64790c34b63c846295fa43a3a9b52777626b', 'molecular_formula': 'C13H24N4O5'},
        'molecular_charge': 0.0,
        'molecular_multiplicity': 1,
        'connectivity': [(0, 5, 2.0), (1, 6, 2.0), (2, 7, 2.0), (3, 8, 2.0), (4, 6, 1.0), (5, 10, 1.0), (5, 12, 1.0), (6, 13, 1.0), (7, 11, 1.0), (7, 19, 1.0), (8, 9, 1.0), (8, 20, 1.0), (9, 22, 1.0), (9, 23, 1.0), (10, 19, 1.0), (10, 24, 1.0), (11, 20, 1.0), (11, 25, 1.0), (12, 26, 1.0), (12, 27, 1.0), (12, 28, 1.0), (13, 16, 1.0), (13, 29, 1.0), (13, 30, 1.0), (14, 15, 1.0), (14, 17, 1.0), (14, 31, 1.0), (14, 32, 1.0), (15, 18, 1.0), (15, 33, 1.0), (15, 34, 1.0), (16, 19, 1.0), (16, 35, 1.0), (16, 36, 1.0), (17, 20, 1.0), (17, 37, 1.0), (17, 38, 1.0), (18, 21, 1.0), (18, 39, 1.0), (18, 40, 1.0), (19, 41, 1.0), (20, 42, 1.0), (21, 43, 1.0), (21, 44, 1.0), (21, 45, 1.0)],
        'fix_com': True, 'fix_orientation': True, 'fix_symmetry': 'c1',
        'provenance': {'creator': 'QCElemental', 'version': 'v0.17.0', 'routine': 'qcelemental.molparse.from_schema'},
        'id': '24773736', 'extras': {'canonical_isomeric_explicit_hydrogen_mapped_smiles': '[O:1]=[C:6]([N:11]([C@:20]([C:8](=[O:3])[N:12]([C@:21]([C:9](=[O:4])[N:10]([H:23])[H:24])([C:18]([C:15]([C:16]([C:19]([N+:22]([H:44])([H:45])[H:46])([H:40])[H:41])([H:34])[H:35])([H:32])[H:33])([H:38])[H:39])[H:43])[H:26])([C:17]([C:14]([C:7](=[O:2])[O-:5])([H:30])[H:31])([H:36])[H:37])[H:42])[H:25])[C:13]([H:27])([H:28])[H:29]'}})
psi4_model = Model(method="B3LYP-D3BJ", basis="DZVP")
start = time.time()
qc_task = AtomicInput(molecule=qcel_mol, driver="energy", model=psi4_model,
                      keywords={'maxiter': 300, 'scf_properties': ['dipole', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices']})
# compute the energy
result = qcengine.compute(input_data=qc_task, program="psi4")
end = time.time()
print("Time taken for one single point energy calculation is:", end - start)
print(result)

Reproducing result

David

  • Attempting to reproduce 34752921on local machine to establish reproducibility by two different people

    • Getting the following from above script on psi4 1.4a3.dev63+afa0c21:

    • Code Block
      Time taken for one single point energy calculation is: 491.5916678905487
      FailedOperation(error=ComputeError(error_type='unknown_error', error_message='QCEngine Unknown Error: Traceback (most recent call last):\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 411, in run_qcschema\n    ret_data = run_json_qcschema(input_model.dict(), clean, False, keep_wfn=keep_wfn)\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 558, in run_json_qcschema\n    val, wfn = methods_dict_[json_data["driver"]](method, **kwargs)\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/driver.py", line 576, in energy\n    wfn = procedures[\'energy\'][lowername](lowername, molecule=molecule, **kwargs)\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2288, in run_scf\n    scf_wfn = scf_helper(name, post_scf=False, **kwargs)\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 1568, in scf_helper\n    e_scf = scf_wfn.compute_energy()\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 93, in scf_compute_energy\n    raise e\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 86, in scf_compute_energy\n    self.iterations()\n  File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 464, in scf_iterate\n    raise SCFConvergenceError("""SCF iterations""", self.iteration_, self, Ediff, Dnorm)\npsi4.driver.p4util.exceptions.SCFConvergenceError: Could not converge SCF iterations in 300 iterations.\n'))
      

Next steps

  • DD: SCFConvergenceError in psi4 driver doesn’t appear to be propagating error up through QCEngine; we definitely see at least one instance of this for 34752921

  • [decision] Pavan and David will each investigate 34752766, 34754174 more thoroughly, as these both more consistently yield “Unknown Error”s

    • we’ll reconvene on Friday afternoon at 2pm PT to compare results

Action items

  •  Pavan Behara will investigate 34752766, 34754174 cases for Friday
  •  David Dotson will investigate 34752766, 34754174 cases for Friday

Decisions