/
2022-03-29 QC Meeting notes

2022-03-29 QC Meeting notes

 Date

Mar 29, 2022

 Participants

  • @Pavan Behara

  • @David Dotson

  • Ben Pritchard

  • @Chapin Cavender

  • @Jeffrey Wagner

 Goals

  • Updates from MolSSI

  • Infrastructure needs/advances

  • Throughput status

    • Openff dipeptides torsiondrives v2.1: 25/26 TD complete

      • a single calculation not getting any workers (?)

    • OpenFF Protein Capped 1-mer Sidechains v1.1: 29 to 36/46 TD

      • CC submitted a revised version last week with modified initial structures

    • SPICE PubChem Set 1 Single Points Dataset v1.2: 128 are consistent errors with SCF convergence or MBIS charge convergence, can be moved to end-of-life

    • SPICE PubChem Set 2 Single Points Dataset v1.2: From 40538 to 84398 calcs, making progress, a bit slowed down in the last few days (used to be around 8K - 10K calcs a day, dropped to 4K a day)

  • User questions/issues

  • Science support needs

 Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • BP: Copy of the new server is up and running and ready for testers.

    • Only has some datasets. Not sure what to do next. To install, check out the next branch of qcfratal-compute and qcportal.

    • What’s the best way to get feedback? GH Issues may be a bit slow, though we could use a tag. Could also use an issue template for next-related issues.

    • Slack channel on QCArchive could work.

  • DD – From OpenFF’s perspective, I’m interested to see how the submission-compute-retrieval cycle looks now. So I may make a copy of qca-dataset-submission and try submitting old datasets to see how they go.

    • BP – That sounds like a great idea. Except that there may be problems if the dataset has been seen before. But feel free to submit experimental sets, since they’ll be deleted after the testing period.

    • DD – Ok, I’ll plan on basically resubmitting a recent dataset to see how it goes through. And the submission machinery should be just a fork of qca-dataset-submission.

    • BP – That should work. The test server has data from last ~september, so newer datasets should look unique. Previously-existing usernames and passwords should work.

    • CC + PB – I’m mostly interested in testing data retrieval and checking the status of ongoing datasets.

    • DD will make a copy of qca-dataset-submission and try resubmitting an old dataset that the migrated server doesn’t know about. So probably a protein torsiondrivfe set, and a big single point set.

    • BP – The managers should be roughly the same (just be sure to use the qcfractal-compute repo now). Also datasets must now have tags. The old URL will work, and the connections using the new versions will get routed to the new server.

    • BP will drop install instructions onto new slack channel

    • BP – No docs yet, I’ll need to get those built.

  • BP – Reaction datasets aren’t done yet, but that doesn’t matter for you.

  •  

Infrastructure needs/advances

  • DD – I’ve got the singlepoint-after-optimization functionality on my radar, haven’t made progress on that yet. Hoping to do so this week.

Throughput status

  • Openff dipeptides torsiondrives v2.1: 25/26 TD complete

    • a single calculation not getting any workers (?)

    • CC – This is the same as the one that’s been running for a long time, or erroring out with a GeomeTRIC error. So we can call this EOL.

    • JW – Is the 26th one usable? Or do you need to do manual intervention to make that usable?

    • CC – I’ll need to do manual intervention to pick one of the other optimizations from that grid point.

    • BP – Could that be automated in the future, and the torsiondrive marked complete?

    • CC – There should be OTHER optimizaitons that completed on that grid point. So we’d likely be fine taking the next-lowest energy.

    • BP – I wonder if we could update the TorsionDrive package to be tolerant of a small number of failures - basically the TorsionDrive package sees all the optimizations and reports a status. So we could change how the status is reported.

    • DD – Would there be a strategy of submitting duplicates that could get around this problem?

      • CC – That would work. I think that’s what DCole’s group does - Basically they seed

      • JW - each grid point consists of multiple opts as part of wavefront porpagation

      • CC – That’s true, but starting from multiple confs would be more likely to have at least ONE of them being complete, and then we could look at the different TorsionDrive jobs and either find the one that’s totally complete, or mark the group of TorsionDrives complete once it’s possible to stitch together torsiondrives to get all grid point completed.

      • DD – I’m suggesting we do independent replicates for each Torsiondrive in a set; not that these replicates have any awareness of each other, because they don’t in the current implementation. I’m saying we can make use of this.

    • PB – What’s the root cause here? Bad initial geometry?

      • CC – Maybe. But it’s important to note that there are like 10 completed opts on the grid point, it’s just the 11th that’s not running (but that’s unlikely to be the minimum energy one anyway).

    • PB + CC – Let’s move this to EOL

  • OpenFF Protein Capped 1-mer Sidechains v1.1: 29 to 36/46 TD

    • CC submitted a revised version last week with modified initial structures

    • PB – Acceptable progress?

      • CC – The progress looks good. I noticed that one structure got in with a bad initial structure so I’ll submit a PR this week that replaces that one.

      • DD – I still see this moving forward.

      • CC – So, don’t move to EOL yet.

  • SPICE PubChem Set 1 Single Points Dataset v1.2: 128 are consistent errors with SCF convergence or MBIS charge convergence, can be moved to end-of-life

  • SPICE PubChem Set 2 Single Points Dataset v1.2: From 40538 to 84398 calcs, making progress, a bit slowed down in the last few days (used to be around 8K - 10K calcs a day, dropped to 4K a day)

    • DD – This bottleneck is genuinely coming from the compute side - We’re getting a little throttled on PRP and Lilac. We’re doing SPICE only on PRP.

    • DD – We’ll be getting access to Max Planck cluster resources via Bert de Groot. Not sure how much throughput to expect. Very nice of Bert to offer that. Mostly GPU cluster so I’m not sure what to expect from CPU power. 128GB RAM max, so may not be suitable for SPICE. But for regular jobs that may be fine. Note that we’ll be at very low priority so I’m not sure how much throughput we’ll get.

User questions/issues

  • DD – Upcoming dataset needs?

    • PB – Not that I know of. But 400kish more SPICE jobs are ready for submission so we won’t be troubled by idle compute.

    •  

Science support needs

 

 Action items

 Decisions