JCl: Can we meet the science team meeting to before the TM-MLFF meeting next week (4pm EST, 8AM SYD) so I can review my slides with you. Also Jeff is interested in attending.
LW: Yeah sounds good, I’ll move it and add him
JCl: I chatted with Chris Iacovella today and because of their need for initial data they are moving quickly ahead with a conformer generation scheme that he thinks should be integrated as a calculator into our pipeline. I also spoke with Jeff about it in our onboarding 1:1.
LW: It seems like you and Jeff agree that implementing this into the pipeline isn’t a priority is that right? I would tend to agree.
JCl: Yes that’s right. Chris expects to generate these conformers of all of tmQM into HDF5s and put them on Zenodo. I think the expectation is that I will pull these down and use the maxiter
functionality that Lexie is working on for Achira to do a few optimization steps and then output the properties. I think I could do that with the optimization
driver, but maybe I’ll do the opt and then a single point with the gradient
driver.
LW: Oh the gradient driver?
JCl: Yes, the way SPICE was done and the way the Chodera Lab prefers to do it diverges from the OpenFF standards where ~half the dataset is not a a minimized structure, so instead of hessians we will have to use numerical gradients as used (and hopefully output) during minimization.
Since Chris is only trying to get some amount of data, I expect that I’ll then take over his methods to run gun-xtb MD on the other datasets I’ll be pulling.
LW: I can see how the asks are growing, this is why we use project plans to define and restrain our tasks, you said he saw yours and approved of it?
JCl: Yes, I think I’ll have to add these to it… I might make the Approach 2 document to illustrate what the steps are.
LW: Sounds good, next time we have a meeting with them, we can get final approval of those documents since so far it’s been just from me.
JCl: Ok sounds good. I do have concerns on the unoptimized structure. These DFT calculations rely on a harmonic approximation around the minimum. If we move away from that minimum it should in theory be fine in small amounts but this is really what DFT-MD engines are for where anharmonic contributions are added such as in CP2K (in ASE but not QCEngine). Setting a number of equilibration steps may not be good enough if it’s not close, but we need a standard to test against. I know a maintainer of CP2K I might bounce this off of, but maybe there’s something in the literature. I don’t want to hold us progress and SPICE did it this way so with the timeline it seems like my concerns should be put aside.
LW: You’re the DFT expert in the room, you might look in the literature and then put anything you find in the slack. Disagreements like this are why we have hierarchy, we might have you present this to the leadership team.
JCl: Ok great. I have questions about the slides for Monday. Since the strategy document was a brain dump of properties, I thought I would break those down and illustrate what we plan to do. Then I’ll talk about the statistics of Brent’s dataset. What stats do you think they’ll expect?
LW: Sounds good, keep it to 4/5 slides. Brents dataset was rather small (44 molecules) so the statistics won’t be so meaningful, you might go through the errors and gain insight into patterns of what failed and why.
JCl: Ok will do, how do I access those?
LW: Download the dataset and look at stout and sterr of the records, through QCA not qcsubmit.
JCl: How do we go about asking them for our “metrics of success” in the project plan? Ultimately they care about the FF being done so we need to provide data to make that happen but that’s an undefined target.
LW: We need them to define what is “good enough” in the forcefield, so our goal is to support that. Ask them something like: “When will you be happy with this forcefield? What metric to measure how "good" the forcefield is?“