2025-01-09 JCl/LW check-in

Participants

  • @Lily Wang

  • @Jennifer A Clark

Goals

  • Flights for workweek

  • Questions about prep for TM meeting

  • Misc QCA stuff

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

Flights for onsite workweek

 

  • JCl: What did you think of the cost increase to fly in on Sunday vs. Monday from my local airport.

  • LW: What do you want to do? We knew this would add extra cost so we accounted for some amount of money with the review board.

  • JCl: I think if I arrive on Monday, then that day will be a wasted, it’s better if I come on Sunday.

  • JCl: How are you getting from LAX? Uber will likely be >$80 and a train is 2.5 hr min.

  • LW: DM gave me the number for a car service.

Newport Town Car Service
Corona Del Mar, CA 92625
949-351-1815, [krd35@juno.com](mailto:krd35@juno.com)

External Conference

 

  • JCl: Since I expect part of performance is rated on external communication, yet I have budget constraints, I was thinking I could go to ACS Fall in DC this year. I have friends I can stay will at no cost. (Registration ~$450 for early bird since I’m a member.)

  • LW: Yes that is an expectation and we have negotiated more wiggle room in the budget to accommodate your travel costs.


QCA Datasets



  • JCl: Lexie requested reviews from each of us. I’ll give a first pass and then let you
    What do you think about having reviews add the dataset type and/or driver to the meta-data? I’ve been trying to pull the spice dataset but seem to be getting the dataset type wrong because I can’t pull it.

  • LW: Make sure you’re using the correct client address. Also, use the list_datasets feature of QC portal to search for it.

…some time later

  • JCl: Wrapping back around to datasets. When comparing the records and our OpenFF Safe 2.0.0 dataset, the new dataset has spec keywords that the records don’t. I went back and reran the script and those keywords are not in my qca-dataset-submission.
    It appears that the spec keywords were added after sending to QCPortal, is that something I should ask Ben about? This goes to show that the “dry-run“ feature we discussed in the QCSubmit users meeting.

  • LW: Wow that’s frustrating, you might ask Jeff since I expect it has to do with the migration of QC to the new version, but then yes, Ben would be the person to ask.
    It seems like you’ll have to bypass qcsubmit and make a new collection on QCPortal directly.

  • JCl: So I won’t use the qca-dataset-submission repo?

  • LW: That’s right, relabel new collection with qcportal create_dataset. Don’t forget to cancel the records for Sage 2.0.0.

  • JCl: Jeff has found those convenient for NRP training, so they are sticking around for now.

TM-FF

 

  • JCl: Can we meet the science team meeting to before the TM-MLFF meeting next week (4pm EST, 8AM SYD) so I can review my slides with you. Also Jeff is interested in attending.

  • LW: Yeah sounds good, I’ll move it and add him

  • JCl: I chatted with Chris Iacovella today and because of their need for initial data they are moving quickly ahead with a conformer generation scheme that he thinks should be integrated as a calculator into our pipeline. I also spoke with Jeff about it in our onboarding 1:1.

  • LW: It seems like you and Jeff agree that implementing this into the pipeline isn’t a priority is that right? I would tend to agree.

  • JCl: Yes that’s right. Chris expects to generate these conformers of all of tmQM into HDF5s and put them on Zenodo. I think the expectation is that I will pull these down and use the maxiter functionality that Lexie is working on for Achira to do a few optimization steps and then output the properties. I think I could do that with the optimization driver, but maybe I’ll do the opt and then a single point with the gradient driver.

  • LW: Oh the gradient driver?

  • JCl: Yes, the way SPICE was done and the way the Chodera Lab prefers to do it diverges from the OpenFF standards where ~half the dataset is not a a minimized structure, so instead of hessians we will have to use numerical gradients as used (and hopefully output) during minimization.
    Since Chris is only trying to get some amount of data, I expect that I’ll then take over his methods to run gun-xtb MD on the other datasets I’ll be pulling.

  • LW: I can see how the asks are growing, this is why we use project plans to define and restrain our tasks, you said he saw yours and approved of it?

  • JCl: Yes, I think I’ll have to add these to it… I might make the Approach 2 document to illustrate what the steps are.

  • LW: Sounds good, next time we have a meeting with them, we can get final approval of those documents since so far it’s been just from me.

  • JCl: Ok sounds good. I do have concerns on the unoptimized structure. These DFT calculations rely on a harmonic approximation around the minimum. If we move away from that minimum it should in theory be fine in small amounts but this is really what DFT-MD engines are for where anharmonic contributions are added such as in CP2K (in ASE but not QCEngine). Setting a number of equilibration steps may not be good enough if it’s not close, but we need a standard to test against. I know a maintainer of CP2K I might bounce this off of, but maybe there’s something in the literature. I don’t want to hold us progress and SPICE did it this way so with the timeline it seems like my concerns should be put aside.

  • LW: You’re the DFT expert in the room, you might look in the literature and then put anything you find in the slack. Disagreements like this are why we have hierarchy, we might have you present this to the leadership team.

  • JCl: Ok great. I have questions about the slides for Monday. Since the strategy document was a brain dump of properties, I thought I would break those down and illustrate what we plan to do. Then I’ll talk about the statistics of Brent’s dataset. What stats do you think they’ll expect?

  • LW: Sounds good, keep it to 4/5 slides. Brents dataset was rather small (44 molecules) so the statistics won’t be so meaningful, you might go through the errors and gain insight into patterns of what failed and why.

  • JCl: Ok will do, how do I access those?

  • LW: Download the dataset and look at stout and sterr of the records, through QCA not qcsubmit.

  • JCl: How do we go about asking them for our “metrics of success” in the project plan? Ultimately they care about the FF being done so we need to provide data to make that happen but that’s an undefined target.

  • LW: We need them to define what is “good enough” in the forcefield, so our goal is to support that. Ask them something like: “When will you be happy with this forcefield? What metric to measure how "good" the forcefield is?“

Action items

@Lily Wang will follow up with leadership team about JCl’s flight options

Decisions