2022-03-15 QC Meeting notes

 Date

Mar 15, 2022

 Participants

  • @Pavan Behara

  • @David Dotson

  • @Jeffrey Wagner

  • Benjamin Pritchard

  • @Chapin Cavender

  • @Joshua Horton

  • @Trevor Gokey

  • @Simon Boothroyd

 Goals

  • Updates from MolSSI

  • Infrastructure needs/advances

  • Throughput status

    • OpenFF ESP Industry Benchmark Set v1.1: last 3 jobs struggling to get workers(?), added compute label

    • Openff dipeptides torsiondrives v2.1: 24/26 TD complete

      • deadslow - (CC: Out of two one may error out so if we reach 25/26 we can move this to end-of-life)

    • OpenFF Protein Capped 1-mer Sidechains v1.0: 5/46 TD

      • 2500+ errors, should we be concerned?

    • SPICE PubChem Set 1 Single Points Dataset v1.2: Errorcyling not working on this now (looks like running out of memory, kill code 137 which is for SIGKILL by OS unless manually stopped), last 16K or so calculations

    • SPICE PubChem Set 2,3 Single Points Dataset v1.2: waiting for workers

  • User questions/issues

  • Science support needs

 Discussion topics

Item

Notes

Item

Notes

Updates from MolSSI

  • BP – Since deleting the data before the last meeting, the server seems to be doing fine.

  • BP – Making the last big push on the refactor. This is on reaction datasets, which I don’t think that OFF uses. Once I move over ReactionServices then I think that’s the last big change that I expect to make.

  • BP – After that, I’ll make a migrated server available for testing. This will contain all of your data up to last september. While that happens, I’ll mostly be taking feedback and making smaller interface changes.

  • BP – So, I expect to get to that at the end of this week. Then I could use a week for the smaller tasks. So we might be looking at the QCF release at the beginning of April.

  • BP – I could either spin up two separate servers, or spin up the old and new archive simultaneously at one address.

    • JW – No strong preference

    • BP – I think I’ll do the “two physics server” approach

  • JW: While migration is going on should we shut down compute on our end?

    • BP - I would make sure the database migration doesn’t happen on a live server

    • DD - We can sit together and plan for it, I would be happy to help

  •  

Infrastructure needs/advances

  • DD – I’m pushing a QCEngine release to support requested features in QCSubmit.

  • PB – We still have to compute pubchem set 2 and 3, so even if the QCSubmit release is delayed, our compute will be saturated for a while.

  • DD – The QCEngine and QCSubmit releases are already made,and I’ve updated the managers.

Throughput status

  • OpenFF ESP Industry Benchmark Set v1.1: last 3 jobs struggling to get workers(?), added compute label

    • PB – I added the compute-openff tag to see if this would get picked up

    • DD – It’s possible that they’re already running, they’d still show as incomplete. Unfortunately It’s hard to track their status in a detailed way.

    • SB – Is there a way to know whether they’re really incomplete vs. taking a long time to run? I know sometimes jobs get desynchronized and the server and manager are not in agreement to their status.

    • BP – In rare cases, a job can be complete, but the server thinks it’s incomplete.

    • SB (later in meeting) – I just checked and the 3 remaining jobs are now complete. I’ll report back if there are any remaining issues.

  • Openff dipeptides torsiondrives v2.1: 24/26 TD complete

    • deadslow - (CC: Out of two one may error out so if we reach 25/26 we can move this to end-of-life)

  • OpenFF Protein Capped 1-mer Sidechains v1.0: 5/46 TD

    • 2500+ errors, should we be concerned?

    • DD, CC, and PB will meet later today to have a working session on this - Will try to understand how to keep this morning forward

  • SPICE PubChem Set 1 Single Points Dataset v1.2: Errorcyling not working on this now (looks like running out of memory, kill code 137 which is for SIGKILL by OS unless manually stopped), last 16K or so calculations

    • DD – At this point, we may need to use AWS. JW, what’s the status of that?

      • JW – Spoke with Karmen, made progress but no username/password yet.

    • DD – Will error cycle pubchem set 1 using my personal computer, since it’ll be large enough.

  • SPICE PubChem Set 2,3 Single Points Dataset v1.2: waiting for workers

  •  

User questions/issues

 

Science support needs

 

 

 

 

 

 Action items

 Decisions