2024-12-18 FF fitting meeting

Participants

Julianne Hoeflich
@Brent Westbrook (Unlicensed)
Barbara Morales
@Jeffrey Wagner
@Pavan Behara
@David Mobley
@Lily Wang
@Chapin Cavender
@Matt Thompson
@Jennifer A Clark
Julia Rice
@Daniel Cole
@Michael Shirts

Recording: https://drive.google.com/file/d/10cHuLH_mI-ROq0i_huvCi2qCF-ZmDUeM/view?usp=drive_link

Discussion topics

Item	Presenter	Notes

Item	Presenter	Notes
Lipid refit and benchmarks	Julianne Hoeflich	JW – (slide 2 or 3) For 3 atom angles, what is x-axis? Shouldn’t be centered around 0 if it’s the actual measured value/equil angle. JH – Which FF should we use (aux vs core) BW – For determinism, maybe set random seed to make them deterministically random. JH – Right. Just wasn’t sure whether we should overload it with lots of training data. So wanted to ask about steps in going toward simulation benchmarks? CC – Depends on whether we want to make a general FF, or a lipid FF. MS – We want to make a general FF. We know that phosphates and N-alkanes weren’t heavily represented in earlier FFs. BW – Do you have a good feeling for the proportion on the new data? These are areas where there wasn’t a ton of data before. MS – Maybe, if we want a generalized FF, we should run other benchmarks (like QM) and see LW – QM benchmarks are relatively easy to run using YDS. How expensive are lipid simulation behncmarks? MS + JH – Relatively inexpensive, a few days. With HMR it’s like 1us, maybe in duplicate in triplicate. LW – Maybe worth doing both and seeing how they go. MT (chat) – We have small molecule benchmarking fully automated (easy to run on the “industry” benchmarking dataset, a little more work but doable to use a different QM dataset in benchmarking). Bring an OFFXML file and get results back in 5-6 hours MS – Interested to see how the QM benchmarks work JR – These lipids are anions - does our QM basis set handle these alright? BW – Using our normal level of theory, no additional diffuse functions that you might use with anions. JR – So no counterion? BW – Right JR – Why no diffuse funcitons? BW – Defaulted to our usual level of theory. LW – This level of theory generally provides a good mix of performance and accuracy DM – PB, did we have phosphates in benchmark? PB – We’re doing double zeta, which shouldn’t be as bad for anions, but we could pick a better one. JR – More data on sulfates? How many entries in the training set for these? DC (chat) – Benchmarking paper https://pubs.acs.org/doi/10.1021/acs.jpcb.4c03167 JH – prioritize lipid benchmarks vs. QM? JW – Yeah, definitely do the QM benchmarks. LW – Generalizable FF is definitely what we’re aiming for. But in protein FF work, we’re willing to accept a regression in small molecule performance to get coverage of rpoteins. CC – I’d say try to get lipids right first, and if that shows regressions, then look into splitting parameters. MS – I suspect that alkane properties… molecules with alkane moieties… might get better from this. Like, I can’t imagine mols needing different params for short vs. long alkanes. CC – There will probably be some systematic differences between parameters that make mols behave well in solvent vs membrane. PB – Only 300 data points different between aux and core datasets? I’d expect them to be about the same in both types of benchmarks. JH – Right. Hard to know how important the 300 additional data points are.
Hmix calculations with Evaluator	Barbara Morales	PB – General Q – Why does the initial packmol density matter? Won’t the simulation converge on the final density anyway, even if the initial packing isn’t as dense as it should be? DM – I think it depends on equilibration timescale. Especially since one of these compounds is rather viscous, it would be hard to detect equilibration. Have you checked out gromacs' insert_molecules? That may be promising. BM – Haven’t looked at that DC (chat) – Finlay’s been working on equilibration detection, joining my group / OpenFF in March: https://chemrxiv.org/engage/chemrxiv/article-details/6706e8e151558a15efa2a2ba LW – I don’t think this is a packmol issue - I think it’s evaluator. We found that when we try to give packmol a higher density, it gives a worse answer. Evaluator adds some buffer to atom radius and box size. So we’re thinking that more equilibration is the key. BM – I don’t have hard numbers on equilibration time here. MS – Maybe we try… (recording @40 mins) JC – If evaluator is adjusting the density, is there a chance that there’s a finite-size effect, when it gives a box of the wrong size? BM – I’m not requesting a specific box size. JC – I see you’re running in NPT, so having a box size of the wrong … shouldn’t be super impactful. So I don’t think the density of packmol matters that much. What comes into my head with the viscosity and degrees of freedom is that these might be critical issue with diethylamine. So those could be passed in as different mols. MS – Ithink the biggest reason that the densitywasn’t equilibrating is that the volume wasn’t near… We’re looking at other things. JC – Could be useful to have an annealing step, running at high temperature before cooling down. DC – MAny years ago, when I was in jorgensen group, we had a directory full of equilibrated liquid boxes. Do we want to make something like that here? MS – For fitting, yes DM – It’s of limited value since we have so much chemical diversity. LW – I’ve been poking around with evaluator to see if this is possible. We’re doing like 1000 runs here, so there’s a good amount of diversity, but if we could cache those within the optimization that’d save a lot of time. MS – Could map XMLs to equilibrated boxes. Could target exptl density. BM – We did manage to chnge the montecarlo barostat, but can the effective tau_p be changed somewhere else MS – We tried changing the barostat and nothing changed when we changed tau_p, so that MT (chat) – Obliged to raise a small alarm bell on doing tons more experiments with temperature control that’s known to be bad. I’m offline for the year at the top of the hour but somebody on the team may be able to guide Barbara on how to use the dev build or make a release with the thermostat patch MS – Probably worth doing … LW – Changes are in main, can install from there. MS – There’s a limit to barostat effectiveness wrt fast haromic bonds. The speed at which the barostat will allow sim to collapse is limited by the stretching of harmonic bonds (since box resizing shrinks harmonic bonds). So we could test using argon. CC (chat) – Re: why is the density of TIP3P-FB off? The OpenFF toolkit was assigning incorrect LJ parameters to water for versions <0.16. Check that your OpenMM systems have the right LJ parameters for TIP3P-FB. MT – At a philosphical level, evaluator has to work on really general stuff. It looks goofy when there’s a viscous fluid that needs special treatment to pack right, but it needs to work in a dead-simple “here’s my exptl data, get me phys props” with as little hand-holding as possible. MS – If it takes 20 times longer, then you can’t run an optimization and that’s not useful. MT – Right, there are tradeoffs. So we could have packmol run for a very long time but we’ll get kicked off NRP for single-trheading/inactivity. But agree that better equilibration detection will help. I believe that LW is setting up vdW refits on NRP already, and packmol runs take 15 mins already, if we did a longer job this would be bad. Also I’m hesitant about caching, that can save some time but also amplify the effect of a bad equilibration. MS – Easiest cache would be to cache the starting point of the optimization. MT – Depends on how we define identical input files. MS – Can test for exact equality.
Issues with Evaluator	Lily Wang	LW – Thinking about solutions - I think an “iteration 0” would be good. Would still leave some single-threaded time in jobs but things should be rapid after that. Also agree with MT’s thoughts on doing non-generalized things. DC – From what I get from this discussion, I don’t tsee anything that should stop new fits. Is there a need to pause? LW – I thinkwe should pause. If the ebnthalpies of mixing aren’t equilibrated… MS – … I think it’d be good to get a better hand on the equilibration. Otherwise we’ll get a lot of stochastic-ness in the properties. I think iuf we did a vdW opt rihg tnow we’d get a lot more noise. DC – Yeah, I do see some sampling issues in the data as presented. MS – Yeah, we do see 10-15% error for different sizes of moleucles. JR – Is it worth trying diethyl amine and ethyl amine?

Meetings

2024-12-18 FF fitting meeting

Participants

Discussion topics

Action items

Decisions