2022-03-01 QC Meeting notes

 Date

Mar 1, 2022

 Participants

  • @Pavan Behara

  • @Chapin Cavender

  • @Trevor Gokey

  • Ben Pritchard

  • @Jeffrey Wagner

  • @Joshua Horton

 Goals

  • Updates from MolSSI

  • Infrastructure needs/advances

    • David Dotson moved files larger than 10MB to git-lfs and the repo size dropped to 108MB

      • errorcycling and other github actions are also updated to sync up with the change and work perfectly

      • re-clone repo, and start new PRs for any relevant ones closed

    • New optimization schema with scf properties evaluated at optimized geometry

      • one extra step of doing single point at the final geometry, acceptable cost, any thoughts?

  • Throughput status

    • OpenFF ESP Industry Benchmark Set v1.0: 52534 from 16170 ~95% from 28% last week

    • Openff dipeptides torsiondrives v2.1: 24/26 TD complete

      • slowmoving - 577 opts since last week, around 2230+ new opts for the last two modified submissions, looks like going in the right direction

      • CC: Out of two one may error out so if we reach 25/26 we can move this to end of life

    • OpenFF Protein Capped 1-mer Sidechains v1.0: 1/46 TD

      • 70793 from 63554 (upwards of 7293) optimizations last week

      • around 1500+ opts per torsionscan done, nearly 3 opts per grid point (575 grid points IIRC)

    • SPICE PubChem Set 1 Single Points Dataset v1.2: 8.5% from last week

      • Lilac compute fully dedicated to spice sets now

    • SPICE Dipeptides Single Points Dataset v1.2: COMPLETE from 99%

  • New submissions

    • Pubchem set1 submitted (Thanks to David!)

    • Modified submissions: SPICE sets v1.2 (other pubchem sets in queue)

    • SPICE DES370K Single Points Dataset supplement v1.0 (submitted)

  • User questions/issues

    • Can we map task id with the dataset name? Or, adding --verbose flag is the only option to see what jobs are being executed on the queue?

  • Science support needs

 Discussion topics

Notes

Notes

  • BP: Running out of space and thinking to prune Mayer and Wiberg indices, size NxN (N= # of atoms)

  • BP – These get stored for every gradient calculation. Deleting the indices from the ESP dataset will free up ~100GB. For some reason, it looks like they’re sometimes duplicated.

    • JW – Is there a way to only store this for the final conformation?

    • BP – No

  • BP – For other datasets, I could delete all the bond indices except for the final step?

    • PB – So future jobs would test whether they’re at the final step and only save the wiberg+meyer info then?

      • BP – No, I’m not changing how they’re stored, I’m just going through completed datasets and clearing them out

    • TG – Could we also keep the meyer+wiberg info at the first step?

      • BP – Yes

    • BP – From torsiondrives and optimizations, could I delete meyer+wiberg except for first and last conf for each?

      • TG – Yes, any time we’re looking at trajectories or torsiondrives, we don’t need intermediate bond info except at beginning and end.

      • PB – I think it’s fine to delete meyer+wiberg info from those places

  • Wavefuncitons

    • JW: I recall wfns taking up too much space, is that still an issue?

    • BP: Yeah, no datasets are evaluating wfns.

    • https://github.com/openmm/spice-dataset/issues/11#issuecomment-998254619

  • BP – It is possible to run an optimization and not keep all gradients. Could I delete intermediate gradient info from opt trajectories?

    • TG – We’d looked at using this info, didn’t bear much fruit, but we’re still looking at how we can subsample

    • JH – I think espaloma needs these intermediates.

    • BP – In the future, we might think about more targeted datasets, and having explicit options to store gradients/bond indices only for the first/last steps.

  • BP – So, I’m going to

    • delete bond indices from first SPICE dataset,

    • will write a script to delete all bond indices except from first and last optimization steps (and for torsiondrives)

    • Try deleting wavefunctions from first SPICE dataset

    • PB – I’ll confirm with espaloma team that the above is OK, then will notify BP to go ahead with the above deletion

  • BP – For the migration, I have optimization, basic datasets pretty close to ready. Reaction datasets are going to be complex.

    • JW – Want to let us know when you’ll do another demo and we can bring in science folks?

    • BP – Yeah, let’s do next week

  • BP – I talked to our hardware vendor about buying a new server, waiting to get quotes.

    • JW – I’d be interested to hearing about what they say, let me know about quotes.

  • JW – Would we want to come up with a step for dataset submission where

    • BP – Wavefunctions: N^2 number of basis functions

    • BP – Bond indices: N^2 times number of gradient steps…

    • BP – Output: ???

    • DD – I think increasing storage is the better option

    • BP – Also, I think a policy of “only computing what you actually need” is a good idea. Not “well, I might be able to use this in 5 years”

    • DD – What do the defaults look like?

      • PB – We’re explicitly feeding in

    • BP – In an OptimizationProtocol, the valid keywords are something like none all first last and first and last. So those could be used immediately if we want.

  • TG – If the bond indices are doubles, I’d be OK to demote them to float. We don’t use that many significant figures on those.

    • PB – Yeah, we’re only really interested in the first ~two decimal places.

 Action items

 Decisions