2020-10-16 Wagner Gokey Check in Meeting notes

Date

Oct 16, 2020

Participants

  • @Jeffrey Wagner

  • @Trevor Gokey

Discussion topics

Notes

Notes

  • What should “Standards” do?

    • QCSubmit should be able to load “Standards” versions with a bunch of preset values for how to operate

    • A new user should be able to read the “Standards” and submit their own dataset which could be used for fitting a main-line OpenFF force field

    • A person should be able to read the “Standards” and do compatible calculations in another engine, like Gaussian

    • “Standards” should provide a protocol

    • The administrator for the dataset pipeline should look at the Standards and see that completed jobs should have hessians triggered

  • How can we make standards separable?

    • Is a hierarchical structure possible? Like, “use all the settings under the Optimization branch of the hierarchy, but don’t take the entire Hessian hierarchy”

  • What do we want to make happen?

    • Anytime we say “X molecule can be used for a FF fit”, we have a strict checklist for what information it must contain

      • How we make it happen: ???

    • Any time someone says “I want to submit a molecule to be used in an FF fit”, we have a strict checklist for how to sub it in such a way to get the information it will need to contain.

      • One path: Make QCSubmit able to apply/enforce this, with human review of squishy things like dataset name/description

      • Another path: Complete human review of submissions, with a checklist that says “make sure it follow these 20 rules”

      • Another path: Some other program/automation to enforce standards

    • Anytime someone says “I just want to do optimizations in the normal openFF way”, they are able to take a SUBSET of our standards and apply them

      • We won’t prioritize supporting this, but it’s likely that QCSubmit will have friendly ways to apply the standards and remove some parts of the pipeline

  • Conforming to the DATASET standards, or the FITTING standards?

    • DATASET standards are a MINIMUM that ALL datasets must adhere to, to make sure that we don’t get badly-named or badly-described datasets

      • These will mostly require human review

    • FITTING standards are a larger set of standards (a strict superset of DATASET standards) that MUST COMPLETELY be followed by datasets intended for main-line FF fitting.

      • “A checklist to make sure the output will be suitable for ForceBalance”

      • These should be largely machine-validatable

    •  

  • Might be good to make 3 things:

    • DATASET standards (human readable document)

    • FITTING standards (human readable technical document)

    • “Best Practices/tutorial/ADHERENCE” document (practical usage of available tools to make dataset that conforms to standards)

  • Questions like “do we filter out confs with hydrogen bonds?” are out of scope of this document



Action items

Decisions