2024-10-07 Westbrook/Wagner Check-in meeting notes

Participants

  • @Brent Westbrook

  • @Jeffrey Wagner

Discussion topics

Item

Notes

Item

Notes

General updates

  • BW –

    •  

  • JW –

    • I still need to request higher vCPU limit on AWS. Confirm that this isn’t blocking, just inconvenient?

      • BW – Right, possibly not inconvenient. I’ve tried scaling up to whole dataset and found a molecule that’s crashing because of an unassigned torsion (I’ve talked with LM about this, previously it was getting filtered because OE couldn’t assign charges, but the new version of OE can handle it, so now it’s erroring out at the torsion assignment stage)

      • BW – Also, it seemed to be running really fast, even on 32 cores. tqdm was at diffeent time reporting 40 mins to 2 hours total runtime, whereas HPC3 was always 3+ hours.

      • JW – Sounds great, let me know if your expected usage will be $50 or higher just so I can be aware. Current projected cost for Oct is $2.33 so feel free to use more.

    • Let’s review what we decided on issues LM reported

      • Some old datasets were submitted both as optimizations AND as singlepoints, with both datasets having the same name.

      • We generally run singlepoints because we want fancy properties (hessians mostly) calculated on the final structure from the optimization.

      • The issue is that we have to access the SINGLE POINT record to get the properties, but the CMILES is only stored on the OPTIMIZATION record (for pre-QCF 0.50 datasets).

      • Two Qs:

        • What do we do about this in the short term?

          • Modify QCSubmit to NOT crash if there’s a missing CMILES (since it was supposed to be raising an error)

          • If all records are missing CMILES, how could we warn the users and point them toward useful info for moving forward?

            • BW – We’ll want to think hard about what this warning would say, because the fix isn’t generally “just switch singlepoint to optimization, since users are probably trying to load a singlepoint to get properties that aren’t on the optimization.

        • Is this blocking science? Do we have a workaround?

          • BW – I don’t think this is blocking science. LM linked a script in the slack chat about this that contains a workaround. But there’s lots of other stuff too

        • BW – Also I put in a workaround for this where you can snatch CMILES for a singlepoint by searching QCA for optimizaitons with the same input molecule and taking their CMILES. But this is hacky enough that we shouldn’t actually do it.

          • JW – Agree, but it’s cool that you managed to do that.

        • BW – I think the CMILES problem might be with QCA?

          • JW – Historically, I think we weren’t able to put CMILES on Molecules, and the closest place we could put to find them was on entries. But for some reason we only put them on optimization entries, maybe single point entries didn’t have a space for them or something. Butthis is a historical problem and QCEl Molecules in the QC*>0.50 stack have a spot to store CMILES.

        • BW – Could we just update our previous datasets to include CMILES?

          • JW – Maybe we could update them in place; maybe we could submit new versions of them with correct CMILES?

      • Coordinates precision issue

        • We’ll discuss at tomorrow’s QC submission meeting

      •  

Trello

https://trello.com/b/dzvFZnv4/infrastructure

Action items

Decisions

Â