2021-03-26 QCA Submission Meeting notes

Date

Mar 26, 2021

Participants

  • @David Dotson

  • @Simon Boothroyd

  • @Pavan Behara

  • @Joshua Horton

  • @Trevor Gokey

  • Ben Pritchard

Goals

  • Highest priority

    • Amide Torsion Set progress

  • New advancements

    • Tautomer enumeration in QCSubmit

    • New fragmentation method from fragmenter into QCSubmit

    • New dzvp basis with iodine support

  • New submissions

    • Industry submissions

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Amide Torsion Set

Simon

  • Lots of errors; not sure what’s going on

    • DD: I re-deployed PRP env; it’s a bit of a blunt instrument so many of these can be due to killing the old deployment

    • The tasks will re-run since they’ve error cycled

Tautomer enumeration

Josh

  • fixed tautomer enumeration that was dropping the original molecule geometries in QCSubmit

  • TG: for torsion indexer, needs specific indices, which is a problem for tautomer enumeration (atom counts change); so, I enumerate tautomers first, then do torsion assignment

New fragmentation method from fragmenter into QCSubmit

Josh

  • JH: running on the JACs set to see what fragments we get via the Pfizer method

    • JH: fragmentation introduces some headaches on workflow component order; very order dependent, and not sure of a good solution yet

New dzvp basis with iodine

 

  • New basis should be in next psi4 release; don’t have a conda package present for Linux yet

    • TG: not sure how we want to handle this; errored iodine cases will work if we re-deploy stack with new psi4, but completed (erroneous) iodine cases will be forever wrong

    • DD: first step, inventory datasets, assess which have at least one iodine-containing molecule, get counts from those of how many molecules; action depends on how pervasive problem is on our collection of sets

  • TG: in the STANDARDS, we’ll want to make use of the flags we developed for this on the old datasets, such as non-compliant, wontfix, etc.

  • DD: shouldn’t submit any iodine containing datasets until we have new psi4 deployed

Genentech set2

Pavan

  • PB: please move this set to scientific review; no longer making progress; 50% completed, 50% errored

    • DD: please let us know if what completed is useful, as well as what you find on why there are so many errored cases

    • PB: there is iodine in the set; may be a factor

MM on CHO sandbox submitted

David

  • DD: finally in on the server; Ben’s improvements to table indices gave enough performance boost for this to go through

Industry submissions

David

  • Any reservations on submitting the 7 industry sets?

    • some contain funny elements like Si

  • JH+TG+SB: filter out elements that don’t work with our FFs for a start; also really interested in having just one, citable dataset for the QM

  • DD: will formulate a path given the objective: a single, clean, citeable dataset in the end, including all molecules from all partners. Ideally too, QM and MM included in the same dataset, even if QM is “pre-optimized”.

  • SB: interested in getting rapid feedback with some of these datasets, so please do prioritize getting some of them

Database duplicates

Ben

  • there are duplicate molecules (~200), cause duplicate optimizations, torsiondrives

    • punting on it for now; results from server not really being thread/process safe, column for hash not unique

Test server is up

Ben

  • BP: test server is up, entirely disposable, feel free to throw small datasets on it for e.g. prototyping STANDARDSv3 implementation, client refactor

Action items

@Joshua Horton will prepare a single, citeable QCArchive QM optimization submission containing all industry datasets; problematic elements like Si, B, and I filtered out.

Decisions