Automated FF coverage gap identification, torsion prioritization, submission generation
Benchmarking (dashboard, etc.)
Discussion topics
Item
Presenter
Notes
Wavefunctions
Trevor
Ordering is canonical on QCArchive for wavefunction basis components; doesn’t appear to be same order as psi4
BP: Psi4 should probably support the approach taken by QCSchema, not the other way around
JH: will raise an issue on Psi4 for supporting QCSchema’s way of storing wavefunction components; see what the devs say about if this is the desired approach
BCC Refit Study
Josh
Ready to go; just needs a re-roll of the dataset with QCSubmit
ANI1,cxx
Josh
We’ll hold off on submission; want to let ANI2X make more progress before tossing this in; torsiondrives are proceeding, but slowly
PEPCONF
David
Trevor reviewed, ready to go. One concern is large molecules with many conformers. This will be a big dataset, with some taking perhaps a long time to compute.
asked John for thoughts. Should be ready to proceed when we get them.
Phenyl Set 1 v2.0
David
Josh reviewed; comments need to be addressed
problem molecules converge with latest psi4!
David will re-roll with testing given feedback
Protomers/tautomers OptimizationDataset
Pavan
Dataset prepared; Josh provided feedback, Pavan is addressing
Update of an old dataset with new information
MM compute specs for CHO sandbox dataset
David
Ready for review! Trevor is interested in reviewing.
Do we want validations for MM specs that try and do create_openmm_system (see if they can be parameterized)?
JH: actually step after this may fail more often, assigning charges
STANDARDSv3
Trevor
STANDARDSv3 is for our consumption, guides our approach to datasets, automation
DD: are there upstream changes we need to support the behaviour we want?
metadata overwriting is needed
should work, will need to test
Ready for review! TG would like reviews from everyone if possible
DD: Aiming for adoption by Dec. 1; once adopted, we can then spin out issues everywhere to implement the vision
TG: CMILES for fitting is a hard problem, but worth trying to plot a course here; at the very least, filter out molecules whose final structures don’t produce the same CMILES on entry; would like solution to work in RDKit at some point
TG: Reliable CMILES generation is dependent on good provenance for software versions, so making this
Dataset index
Josh
JH: need to add the function to PR
Local Optimization Executor
Trevor
TG: Working on it
Do we need more compute?
David
TG: I think we’re okay; we have a lot of datasets right now
Can we measure e.g. optimizations per day?
TG: Could probably get this from the database itself; recall Daniel provided these numbers regularly.
BP: Can we assemble a document with a list of metrics we would like to see / would be useful for decision-making; for the dashboard we want to know this.
DD: I’ll create a thread on Slack for assembling desired metrics
JH: Is there any plan on QCArchive for a tool that can give predictions on the length of time a calculation will take?
BP: There are issues with this, but we are working on making this work
JH: Does it only work with single-point data, or things like Optimizations?
TG: Seconds per electron would be useful
DD: Having a validation step that uses the model to give a resource use estimate would be incredibly useful for dataset submission decision-making, capacity planning
JH: If there are datasets needed to help with this work, we’d be able to assist
TG: interested in exploring this; will give this a shot from manager data
PCM support
Josh
PCM support should all be in place; will be testing with Simon’s dataset
Public QCA
Ben
Still getting issues with services failing (e.g. torsiondrives). Need to solve this today before managers spun up.
DD: we’ll hold off on manager upgrade and deployment until Ben gives the go-ahead.
Add Comment