2020-10-16 QCA Submission Meeting notes

Date

Oct 16, 2020

Participants

  • @David Dotson

  • @Pavan Behara

  • Ben Pritchard

  • @Trevor Gokey

  • @Joshua Horton

Goals

  • New advancements

    • Compressed dataset.json handling in QCSubmit

    • Large performance improvements via multiprocessing in QCSubmit

    • Datasets in QCSubmit sped up in many areas; large datasets handled more smoothly

  • New submissions

    • TG: PhAlkEthOH dataset (submitted!)

    • JH: XTB methods on JACS set (submitted!)

    • DD: Jessica Maat’s Phenyl resubmission (in preparation)

    • JH: Protein Fragments Optimizations (resubmit with version bump, constraint indices 1-based)

    • DD: MM/ML on unfragmented JACS set (blocked)

    • TG: Enamine REAL subset Optimization

  • Upcoming infrastructure improvements

    • STANDARDS-based versioning #137

    • Dataset index on qca-dataset submission #147

  • Upcoming science support

    • Selection of TorsionDrives from a dataset by SMARTS matching

    • PCM-based implicit solvent pathway

    • ESPs and wavefunction storage

  • Larger advances

    • Automated FF coverage gap identification, torsion prioritization, submission generation

    • Benchmarking (dashboard, etc.)

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Server compression

Ben

  • Will be downtime at some point, probably next week

basis-set-exchange release today

Ben

  • Hoping to get this out today

max-services

Trevor

  • What’s the limit set to on the server?

    • BP: 500, but I see like 600-700 running

    • system load currently 1-2/16

    • 34GiB/500GiB memory use

Performance improvements in QCSubmit

Trevor

  • New release of QCSubmit should include this.

  • JH: New release is up; OpenEye hard dependency is gone

  • JH: conformers lazy loaded in datasets now; saves a ton of time when conformers not actually needed

Dataset index

Josh

  • JH: Putting this together in a PR

    • PR is up! Trevor will review

    • DD: please also add a checkbox to the PR template to make sure this index gets updated

Jessica Maat’s dataset

David

  • Re-use the same conformers; will need to experiment with maxiter for SCF iterations for the iodine cases

  • TG: will do some experimenting with the convergence toleration as an alternative approach to the iodines

Protein Fragments Optimizations

Josh

  • JH: just need to recompress, get it all green

  • DD: Trevor will review; can leave to JH discretion on when to merge/launch

  • DD: Do we want a version without dihedral constraints? The constraints were in service to Dave Cerutti, but I don’t think we plan to or can use them for FF fitting?

  • TG: Will verify with Mobley that we want an unconstrained version; 80% sure we do.

Enamine REAL

Trevor

  • Makes sense to treat this as a big pool of molecules for later

    • DD: agreed, and no great need at this time; compute saturated

STANDARDSv3

Trevor

  • Trying to scope out; has a good direction; working on a draft

TorsionDrives SMARTS matching

David

  • Pavan interested in doing work on this; DD will schedule time next week to pair program initial approach; all welcome to attend

    • 9am Tuesday proposed

Automated FF coverage

David

  • TG: would probably take Enamine REAL, do MM coverage

    • Enamine REAL is a bunch of SMILES

    • Will work on a prototype

Action items

@Joshua Horton will finish out qca-dataset-submission’s INDEX.md; add checkbox for PRs including an additional row to INDEX.md.
@David Dotson will finish out the phenyl resubmission for Jessica Maat using the same conformers; prior to submission, experiment with maxiter for SCF iterations on the iodine cases to ensure convergence is possible at all
@Trevor Gokey will experiment with SCF convergence toleration as an alternative approach for the phenyl molecules that include iodine
@Joshua Horton will finish preparing the constrained optimizations for protein fragments submission; can merge at discretion after review
@David Dotson will schedule a time next week with @Pavan Behara to pair-prototype SMARTS matching for TorsionDrive datasets
@Trevor Gokey will prototype usage of Enamine REAL for MM coverage analysis; this is the start of our work on gap-filling automation and is of scientific interest generally

Decisions