2021-02-10 QCArchive - PEPCONF Investigation Meeting notes

Date

Feb 10, 2021

Participants

  • @David Dotson

  • @Pavan Behara

Goals

  • Identify optimization specimens reproducing “Unknown Errors” in production

  • Detail next steps for investigation, action to:

    • improve error reporting in QCEngine Psi4Harness

    • share problem cases with psi4 developers / quantum chemists to determine possible solutions

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Optimization failure specimen

Pavan

  • Optimization ID: 34752921

    • No psi4 output, fails on first iteration; geomeTRIC then chokes on no data to operate on

    • psi4 log from file is truncated; consistent with psi4 dying abruptly

    • Used --messy in Psi4Harness to preserve file outputs

    • Also put together script to run point calculation; should produce same result

    • we observe this one yielding SCF convergence error in at least one case in error cycling, uknown error in at least one case

  • Optimization ID: 34752766

    • consistently shows up as unknown error or timed out on error cycling

  • Optimization ID: 34754174

    • consistently shows up as unknown error in error cycling

Script for reproducing results

Pavan

from openforcefield.topology import Molecule import qcengine from qcelemental.models import AtomicInput, OptimizationInput from qcelemental.models.common_models import Model from qcelemental.models.procedures import QCInputSpecification import time qcel_mol = dict({'schema_name': 'qcschema_molecule', 'schema_version': 2, 'validated': True, 'symbols': ['O', 'O', 'O', 'O', 'O', 'C', 'C', 'C', 'C', 'N', 'N', 'N', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'N', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H'], 'geometry': [[-6.85549362e+00, 5.47263221e+00, -7.91620020e-01], [-6.09996888e+00, 3.73272044e+00, 5.36105059e+00], [-2.25309022e+00, -5.16413070e-01, -2.82277346e+00], [ 1.06456093e+00, -5.02849477e+00, 2.85972315e+00], [-1.02057886e+01, 3.19498085e+00, 6.38068449e+00], [-7.63636902e+00, 3.58961518e+00, -2.08465129e+00], [-7.99626705e+00, 2.34278525e+00, 5.17868413e+00], [-2.43434662e+00, -1.84480750e-01, -4.83915730e-01], [ 2.62393077e+00, -4.15006357e+00, 1.28072634e+00], [ 4.77578996e+00, -5.58614171e+00, 7.70056030e-01], [-6.66219637e+00, 1.17397580e+00, -1.46107784e+00], [-3.60192510e-01, -7.28054700e-01, 1.06375098e+00], [-9.47728705e+00, 4.04038190e+00, -4.11757290e+00], [-8.17540217e+00, -7.49641700e-02, 3.85301030e+00], [ 6.63575661e+00, -3.68688560e-01, 1.36631900e-01], [ 6.78411089e+00, -5.11592810e-01, -2.71665048e+00], [-5.85267366e+00, -9.63103850e-01, 2.56608701e+00], [ 3.99821435e+00, 2.29254750e-01, 9.28320550e-01], [ 9.48534052e+00, -1.11783055e+00, -3.44108157e+00], [-4.86248398e+00, 7.77298940e-01, 5.35042790e-01], [ 2.04476167e+00, -1.67525524e+00, 1.01865120e-01], [ 1.11301882e+01, 8.54402510e-01, -2.50661962e+00], [ 5.77336114e+00, -6.44943022e+00, 2.21597558e+00], [ 5.40542122e+00, -5.80792492e+00, -1.06924251e+00], [-7.28527743e+00, -3.66056620e-01, -2.50030800e+00], [-4.68861140e-01, -4.65824930e-01, 3.01559756e+00], [-1.08410818e+01, 2.48751058e+00, -4.39050708e+00], [-8.50930310e+00, 4.37890762e+00, -5.96685880e+00], [-1.05931624e+01, 5.74501939e+00, -3.62226351e+00], [-8.64203211e+00, -1.54344897e+00, 5.33175395e+00], [-9.87294500e+00, -1.17790500e-02, 2.62794374e+00], [ 7.80117252e+00, 1.34966238e+00, 6.51788560e-01], [ 7.46705658e+00, -2.02231415e+00, 1.03913464e+00], [ 6.19573927e+00, 1.36697353e+00, -3.47446245e+00], [ 5.52569530e+00, -1.97528834e+00, -3.49426273e+00], [-4.39623371e+00, -1.40984956e+00, 3.99569247e+00], [-6.30394383e+00, -2.78542805e+00, 1.61426803e+00], [ 3.80986311e+00, 4.27550830e-01, 3.00371152e+00], [ 3.47512355e+00, 2.13457946e+00, 1.47040350e-01], [ 9.53549828e+00, -1.16640291e+00, -5.55563998e+00], [ 9.98238317e+00, -2.95885003e+00, -2.62218022e+00], [-4.35350739e+00, 2.60267722e+00, 1.50390262e+00], [ 1.98932809e+00, -1.89377873e+00, -1.98558053e+00], [ 1.00910926e+01, 2.49700644e+00, -2.22755224e+00], [ 1.19005249e+01, 2.01640330e-01, -8.20297820e-01], [ 1.26429939e+01, 1.16188463e+00, -3.74007670e+00]], 'name': 'C13H24N4O5', 'identifiers': {'molecule_hash': 'fa1a64790c34b63c846295fa43a3a9b52777626b', 'molecular_formula': 'C13H24N4O5'}, 'molecular_charge': 0.0, 'molecular_multiplicity': 1, 'connectivity': [(0, 5, 2.0), (1, 6, 2.0), (2, 7, 2.0), (3, 8, 2.0), (4, 6, 1.0), (5, 10, 1.0), (5, 12, 1.0), (6, 13, 1.0), (7, 11, 1.0), (7, 19, 1.0), (8, 9, 1.0), (8, 20, 1.0), (9, 22, 1.0), (9, 23, 1.0), (10, 19, 1.0), (10, 24, 1.0), (11, 20, 1.0), (11, 25, 1.0), (12, 26, 1.0), (12, 27, 1.0), (12, 28, 1.0), (13, 16, 1.0), (13, 29, 1.0), (13, 30, 1.0), (14, 15, 1.0), (14, 17, 1.0), (14, 31, 1.0), (14, 32, 1.0), (15, 18, 1.0), (15, 33, 1.0), (15, 34, 1.0), (16, 19, 1.0), (16, 35, 1.0), (16, 36, 1.0), (17, 20, 1.0), (17, 37, 1.0), (17, 38, 1.0), (18, 21, 1.0), (18, 39, 1.0), (18, 40, 1.0), (19, 41, 1.0), (20, 42, 1.0), (21, 43, 1.0), (21, 44, 1.0), (21, 45, 1.0)], 'fix_com': True, 'fix_orientation': True, 'fix_symmetry': 'c1', 'provenance': {'creator': 'QCElemental', 'version': 'v0.17.0', 'routine': 'qcelemental.molparse.from_schema'}, 'id': '24773736', 'extras': {'canonical_isomeric_explicit_hydrogen_mapped_smiles': '[O:1]=[C:6]([N:11]([C@:20]([C:8](=[O:3])[N:12]([C@:21]([C:9](=[O:4])[N:10]([H:23])[H:24])([C:18]([C:15]([C:16]([C:19]([N+:22]([H:44])([H:45])[H:46])([H:40])[H:41])([H:34])[H:35])([H:32])[H:33])([H:38])[H:39])[H:43])[H:26])([C:17]([C:14]([C:7](=[O:2])[O-:5])([H:30])[H:31])([H:36])[H:37])[H:42])[H:25])[C:13]([H:27])([H:28])[H:29]'}}) psi4_model = Model(method="B3LYP-D3BJ", basis="DZVP") start = time.time() qc_task = AtomicInput(molecule=qcel_mol, driver="energy", model=psi4_model, keywords={'maxiter': 300, 'scf_properties': ['dipole', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices']}) # compute the energy result = qcengine.compute(input_data=qc_task, program="psi4") end = time.time() print("Time taken for one single point energy calculation is:", end - start) print(result)

Reproducing result

David

  • Attempting to reproduce 34752921on local machine to establish reproducibility by two different people

    • Getting the following from above script on psi4 1.4a3.dev63+afa0c21:

    • Time taken for one single point energy calculation is: 491.5916678905487 FailedOperation(error=ComputeError(error_type='unknown_error', error_message='QCEngine Unknown Error: Traceback (most recent call last):\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 411, in run_qcschema\n ret_data = run_json_qcschema(input_model.dict(), clean, False, keep_wfn=keep_wfn)\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/schema_wrapper.py", line 558, in run_json_qcschema\n val, wfn = methods_dict_[json_data["driver"]](method, **kwargs)\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/driver.py", line 576, in energy\n wfn = procedures[\'energy\'][lowername](lowername, molecule=molecule, **kwargs)\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 2288, in run_scf\n scf_wfn = scf_helper(name, post_scf=False, **kwargs)\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/proc.py", line 1568, in scf_helper\n e_scf = scf_wfn.compute_energy()\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 93, in scf_compute_energy\n raise e\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 86, in scf_compute_energy\n self.iterations()\n File "/home/david/.conda/envs/qcarchive-worker-openff-psi4/lib//python3.7/site-packages/psi4/driver/procrouting/scf_proc/scf_iterator.py", line 464, in scf_iterate\n raise SCFConvergenceError("""SCF iterations""", self.iteration_, self, Ediff, Dnorm)\npsi4.driver.p4util.exceptions.SCFConvergenceError: Could not converge SCF iterations in 300 iterations.\n'))

Next steps

 

  • DD: SCFConvergenceError in psi4 driver doesn’t appear to be propagating error up through QCEngine; we definitely see at least one instance of this for 34752921

  • [decision] Pavan and David will each investigate 34752766, 34754174 more thoroughly, as these both more consistently yield “Unknown Error”s

    • we’ll reconvene on Friday afternoon at 2pm PT to compare results

Action items

@Pavan Behara will investigate 34752766, 34754174 cases for Friday
@David Dotson will investigate 34752766, 34754174 cases for Friday

Decisions