2025-01-06 to 10 Clark/Wagner Check-in meeting notes

Discussion topics

Item

Notes

Item

Notes

2025-01-06

  • JCl - HDF5 download from QCA - Can you join science meeting today at 6 PM eastern?

    • JW – Yes, I’ll be there

  • JW – This is a good time to book for Irvine in-person week (Feb 17-21) - Easy to fly to Santa Ana “John Wayne” airport (code “SNA”) and stay at an airport hotel (I’m at the hampton inn across the street).

    • JCl – Should I plan to fly out Sun?

    • JW – Your choice, but try to be available to be onsite Monday at 9 AM. And fly back Fri night/Sat morning (If you choose to stay extra days we’ll still cover your flight home, as long as the price is comparable to flying fri night/sat morning)

    • JCl – Do I need to prepare anything?

      • JW – Not currently, we’ll let you know if that changes.

  • JW – Next Mon we’ll start formal iteration planning with whole org (this is new for all of us)

    • JCl – I already have a big project plan for organometallics. Thinking of making one for datraset longevity. Should I do this in Confluence or hold off and do it on ZenHub?

      • JW – Do it on confluence.

  •  

Tasks to pick from

  • Sage 2.0 dataset cleanup/redo

    • JCl – Possibly follow-on from this would be adding an option to QCSubmit to answer whether a new submission would create new records on the server (as opposed to just reusing existing ones)

  • Taking over NRP compute

  • Debugging organometallics

  • Toolkit implicit Hs PR

    • JW – This should be your top priority on the infrastructure side

    • And possibly further upstream changes

  • Dataset longevity planning (HDF5 downloads, Sage record consolidation, Zenodo upload/other cold storage)

NRP onboarding

https://github.com/openforcefield/qca-dataset-submission/tree/master/NRP

2025-01-07

  • NRP onboarding

    • JC got kubeconfig file

    • JC will work out onepass access for tomorrow

    • JC will go through NRP quick start

  • PR review

    • (finished+approved)

  •  

2025-01-08

  • Get NRP workers running

  • JW sent JC credentials on protonmail

  •  

2025-01-09

  • Get JC started running QC runners

  • JC – Infrastructure Q: The Chodera lab (e.g. SPICE) uses half optimized and half unoptimized geometries to fit their FF. (The forces taken from the negative gradient instead of minimized Hessian). To generate these unoptimized geometries, Chris Iacovella is running tblite in ASE to get MD frames of TM complexes with MD. He wants to put this functionality into our calculator pipeline.

    • JW – This sounds like a different way to get starting conformers. When we submit a QC dataset, we have two general ways of getting starting conformers

      • 1) Sometimes our starting conformers are external/magic (ex: make a submission where we just “have” some XYZ coordinates to start from, and kinda toss the script that made those XYZ coords into the submission direction on QDS)

      • 2) Sometimes our starting conformers are from a previous QC dataset (like, running an opt dataset, and starting single point calcs from the end points of those)

    • JC – CI is asking whether we can make a new black box for generating conformers, mostly aligning with providing something for the “external/magic” box in category 1.

    • JW – This seems like a cost/benefit question: “should we make a new product to generate confs using tblite?”. I’m generally the person who can estimate costs, LW/JE/DM will need to weigh in on benefit value.

      • This would be easy to implement - QCEngine can do most of the work, we just do some wrapping around it and file formatting (ex getting QCEngine outputs back to SDF format). If there are missing features this may require coordinating with MolSSI/Psi4 devs, and we don’t have a contractual relationship with them (can just ask nicely for things)

      • This would be medium/hard to maintain: tblite is complex and isn’t in our existing dependency stack, so this is a complex upstream we’d be accepting (including all ITS upstreams). If we could somehow define this as not-a-product (but like a one-time artifact).

      • This would be hard to do “reproducibly” - Anything with nonzero temperature simulation is a mess.

    • JC – While moving ahead onboarding with OpenFF methods, JC lab side is moving ahead fast and will do this whether or not we support it.

    • JW – I’d be fine with them creating a script to do this tblite-based conf gen and just putting it in the submission directory.

    • JC – Could add a tblite wrapper to qcengine. Then could make a new kind of dataset like torsiondrive that generates many conformers per input.

    • Should we make a tblite wrapper for QCEngine?

      • JW – Maybe eventually, but if all CI needs for urgent goals is some properties, he’ll be better served by making a python script to parse raw output and get the numbers of interest.

    • Should we make some sort of dataset like TorsionDrives to generate confs using tblite? Should it go in QCSubmit?

      • JW – I think this is all too young to put into a library.

      • JC – Could make a new dataset type for QCSubmit for tblite MD, but with dataset classes defined in a repo owned by Chodera Lab.

      • JW – To get this new type of dataset to meaningfully run as a QCF dataset will also require changes to QCF, which will be a big lift.

      • JC+JW: In summary, this looks like a longer turn around time than continuing with his current running of tblite in their hpc, but something to keep an eye on and consider integrating if invaluable in the future.

  •  

  •  

  •  

Action items

Decisions