2022-05-26 Force Field Release Meeting notes

 Date

May 26, 2022

 Participants

  • @Pavan Behara

  • @Owen Madin

  • @Lily Wang

  • @Chapin Cavender

  • @David Mobley

  • @Chapin Cavender

  • @Jeffrey Wagner

  • @Michael Shirts

  • @Christopher Bayly

 

Recorded meeting link: https://drive.google.com/file/d/178ZU4hjjMltfX54IcqItsWWYDvakqI6i/view?usp=sharing

 Discussion topics

Time

Item

Notes

Time

Item

Notes

20 mins

General discussion on v-sites plan

  • DM – LW, have you had a chance to talk with Simon about handoff?

    • LW – Yeah, I’ve spoken with SB twice about this and SB sent over data from Lilac

    • OM: Project Plan

      • OM – Showed that vsites improved performance in many cases.

      • CB – Did the vsite refitting include an LJ refit as well? Or are the before- and after- plots generated using different LJ parameters as well?

      • OM – I’m not sure, will need to look into this

      • CB – Do we have mixture data involving thiophene? Some earlier work had shown the importance of this group.

      • OM – I don’t think so. We didn’t have any sulfur compound data in the Sage training set - For the sulfur-containing data we didn’t have at least 5 data points so they didn’t meet our inclusion criteria.

      • CB – Could we include aromatic sulfur in future training?

      • OM – Right now it’s a data issue. We could relax criteria or find more data to get suitable inputs.

      • CB – Could we move forward without experimental data? Using QC or something else?

      • OM – I don’t think I can make that call.

      • PB – I can get the electrostatic energy decomposition from SAPT, and that could be a good source of theory training data.

      • CB – I do agree with the preference to train on experimental data. But if our FF is going to work on druglike chemistry, we should let theory help guide how we prioritize the search for experimental data.

      • MS – If we have vsites that get tuned to experiment, we’ll have to decide which experiments to do in the first place. And if we only have enough experimental data to use for benchmarking/validation…

      • CB – If, in the context of doing all kinds of chemistry, we’re missing experimental data in one area, would we leave that part of the FF untouched?

      • MS –

      • OM – If we don’t have experimental data, we’d have trouble adjusting LJ values using just throey data

      • CB – People have done that before, though. But we should identify areas where using pure theory is valid (without experimental data). There’s existing work showing that this is valid, but it’s a bit indirect (it works in discovery projects/hit selection)

      • LW – Charges are one area where I’m not sure that high-level theory data translates into usable FFs. …

      • MS – We could say “for sulfur, we’ll just fit to ESPs”. Is there liquid property/experimental data we can use to validate?

        • OM – I could take a look. There may be density data, but there’s less likely to be enthalpy of mixing data, and I could check on solvation free energy data.

      • CB – These off-center charges on S won’t have a big consequence in most cases. Where they’ll have a consequence is when there’s a strong hbond acceptor in the vicininty of the thiophene. So that would be a good way to look for data - basically the neighboring group would be in or out of plane. Like thiophene and pyrrole or another donor-only solvent. So even if you had neat liquid thiophene, it probably wouldn’t be very useful for thiophene fitting - You’d need short range, directional interactions.

      • MS – So maybe we can do stuff with S without a validation set? But I’m really uncomfortable moving ahead with no experimental validation.

      • OM – To be fair, our current LJ params are from OPLS and are intended for use with a different combining rule. So it’s a low bar to clear.

      • CB – The theory comparison I’d do would be a thiophene-water dimer. I’d scan over translation and rotation of the water, in particular around the thipohene group. Then I’d train to ESPs and dimer energies. But this still wouldn’t obviously be a replacement for experimental data. We’d need to make the argument for why we can move forward without experiment.

      • DM – If we already have vsites that are justifiable, maybe we should take that gain and go with it, and then deal with the more complex ones later.

        • MS – Agree

        • CB – Sounds good

      • CB – Did we put “rabbit ears” go on carbonyl oxygens?

        • MS – SB didn’t do this experiment, but we could try putting them in.

        •  

      • CB – Are we using xtal data?

        • MS – A student in my group is working on that, it’s production use is a ways off. There’s big open questions about how to use it correctly. It seems like it’ll be for validation before we use it for training.

      •  

    • MS + DM – This is a big point in ad board meetings, folks would love to have vsites in the future.

    • LW – My understanding is that Rosemary is protein parameters at a minimum, and vsites aren’t guaranteed.

    •  

  • MS: SB has worked quite a bit but there is still work on which ones need to be included.

  •  

10 mins

Water model refitting

  • OM – Looking at crunching some of the data from the sage fitting, it looks like we may want to refit the water model.

  • https://drive.google.com/file/d/17sQhGpceQ_nU_jzT1MhHcw24p0thVAi2/view?usp=sharing

  • CB – A few reasons why this concerns me a bit:

    • Our current water funcitonal form has been used to train water models a few times. Even LPW’s refits used this functional form. But these were all trained on pure water experimental data.

      • DM – The original plan, which I still like, is based on the idea that water models are underconstrained in fitting. Like, the properties that we’re fitting to don’t matter much. So my thinking was to take in the “normal” training data, and not include it in the FF training the first time, but use it for validation. Then to start titrating in the pure water data into the training and see how it affects our small molecule data.

      • CB – Agree in principle. But how many parameters are there to fit? charges, angles?

      • CB – Free energies of transfer are specifically tweaking a known issue with fixed charge FFs - the lack of polarizability. But what we’re really doing with transfer free energies is monkeying around the lack of polarizability.

      • MS – hydration free energies are even worse than transfer free energies.

      • CB –

      • DM – The lack of polarizability is worse in hydration free energies than the transfer free energies. Like, it should be half because instead of a dielectric of 2 it has a dielectric of 1.

      • CB – Still concerned that vapor doesn’t have the same intermolecular interactions as liquid. But I do like the idea of keeping the traditional water datasets around for validation, and titrating more of the data in the ensure that the performance doesn’t get terrible.

      • MS – Really, I don’t think that people really care about what the water model does as long as it doesn’t mess up the proteins. People are going to be skeptical of the water model if there’s a risk that it will mess up proteins

      • CC – To add to that, it does seem like the ability of the water model to reproduce tumbling is a limiting factor in matching NMR experiments.And TIP3P is known to be bad for this.

      • MS – My instinct is that no 3 point model will work well here, so we’ll need a 4-point model. I think the successes of TIP3P are largely accidents and aren’t likely to extend to our FFs - They’re likely to become the constraint in our performance.

      • CB -- I suggest that we train both a 3-point and a 4-point water model so we have the option for different uses.

        • MS – I think we should stick to one model

        • DM – OM, are you planning to do further fitting work on water models?

        • OM – No

        • DM – Ok, LW, this is something that would be good to look at when you start.

      • DM – I was thinking that these should be co-optimized, so our FF would be designed for use with a specific water model that we also make.

 

 

  • PB – Is this a good meeting time?

    • MS – This time works for me now

    • CC – I can try to make this time, may have to jump off early sometimes. I’ll be moving to CA soon so

    •  

 Action items

 Decisions