2020-11-13 QCA Submission Meeting notes

Date

Nov 13, 2020

Participants

  • @David Dotson

  • @Trevor Gokey

  • @Pavan Behara

  • @Joshua Horton

  • Ben Pritchard

Goals

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Wavefunctions

Trevor

  • Ordering is canonical on QCArchive for wavefunction basis components; doesn’t appear to be same order as psi4

  • BP: Psi4 should probably support the approach taken by QCSchema, not the other way around

  • JH: will raise an issue on Psi4 for supporting QCSchema’s way of storing wavefunction components; see what the devs say about if this is the desired approach

BCC Refit Study

Josh

  • Ready to go; just needs a re-roll of the dataset with QCSubmit

ANI1,cxx

Josh

  • We’ll hold off on submission; want to let ANI2X make more progress before tossing this in; torsiondrives are proceeding, but slowly

PEPCONF

David

  • Trevor reviewed, ready to go. One concern is large molecules with many conformers. This will be a big dataset, with some taking perhaps a long time to compute.

    • asked John for thoughts. Should be ready to proceed when we get them.

Phenyl Set 1 v2.0

David

  • Josh reviewed; comments need to be addressed

    • problem molecules converge with latest psi4!

    • David will re-roll with testing given feedback

Protomers/tautomers OptimizationDataset

Pavan

  • Dataset prepared; Josh provided feedback, Pavan is addressing

  • Update of an old dataset with new information

MM compute specs for CHO sandbox dataset

David

  • Ready for review! Trevor is interested in reviewing.

  • Do we want validations for MM specs that try and do create_openmm_system (see if they can be parameterized)?

    • JH: actually step after this may fail more often, assigning charges

STANDARDSv3

Trevor

  • STANDARDSv3 is for our consumption, guides our approach to datasets, automation

  • DD: are there upstream changes we need to support the behaviour we want?

    • metadata overwriting is needed

    • should work, will need to test

  • Ready for review! TG would like reviews from everyone if possible

  • DD: Aiming for adoption by Dec. 1; once adopted, we can then spin out issues everywhere to implement the vision

  • TG: CMILES for fitting is a hard problem, but worth trying to plot a course here; at the very least, filter out molecules whose final structures don’t produce the same CMILES on entry; would like solution to work in RDKit at some point

  • TG: Reliable CMILES generation is dependent on good provenance for software versions, so making this

Dataset index

Josh

  • JH: need to add the function to PR

Local Optimization Executor

Trevor

  • TG: Working on it

Do we need more compute?

David

  • TG: I think we’re okay; we have a lot of datasets right now

    • Can we measure e.g. optimizations per day?

  • TG: Could probably get this from the database itself; recall Daniel provided these numbers regularly.

  • BP: Can we assemble a document with a list of metrics we would like to see / would be useful for decision-making; for the dashboard we want to know this.

  • DD: I’ll create a thread on Slack for assembling desired metrics

  • JH: Is there any plan on QCArchive for a tool that can give predictions on the length of time a calculation will take?

    • BP: There are issues with this, but we are working on making this work

    • JH: Does it only work with single-point data, or things like Optimizations?

    • TG: Seconds per electron would be useful

    • DD: Having a validation step that uses the model to give a resource use estimate would be incredibly useful for dataset submission decision-making, capacity planning

    • JH: If there are datasets needed to help with this work, we’d be able to assist

  • TG: interested in exploring this; will give this a shot from manager data

PCM support

Josh

  • PCM support should all be in place; will be testing with Simon’s dataset

Public QCA

Ben

  • Still getting issues with services failing (e.g. torsiondrives). Need to solve this today before managers spun up.

  • DD: we’ll hold off on manager upgrade and deployment until Ben gives the go-ahead.

Action items

@Joshua Horton will raise an issue on Psi4 for supporting QCSchema’s way of storing wavefunction components; see what the devs say about if this is the desired approach
@Joshua Horton will re-roll BCC Refit Study dataset with latest QCSubmit, merge when ready
@Trevor Gokey will approve PEPCONF OptimizationDataset when satisfied; @David Dotson will merge when QCArchive server back up
@David Dotson will address review items from Josh on Phenyl Set 1 v2.0
@Trevor Gokey will review MM compute specs for CHO sandbox dataset
Everyone should give STANDARDSv3 review feedback if possible in the next week or so
@Trevor Gokey might explore building a model for OptimizationTask execution time as a function of molecule size / electron number from manager logs on UCI
@David Dotson will start a thread on the #qcfractal channel for desired metrics for decision-making and status of compute; solicit opinions from others outside this meeting group
@Joshua Horton will add the generation script/function(s) used to generate dataset index to PR
@Trevor Gokey will PR his local optimization executor to QCSubmit
Ben Pritchard will hammer out services issues on server; notify us when server back up in a satisfying state

Decisions