2024-08-14 FF fitting meeting

Participants

  • @Brent Westbrook (Unlicensed)

  • @Alexandra McIsaac

  • Bill Swope

  • @Jeffrey Wagner

  • @Chapin Cavender

  • @David Mobley

  • @Pavan Behara

  • @Lily Wang

  • @Michael Shirts

  • Barbara Morales

  • Julianne Hoeflich

  • Patrick Frankel

  • Megan Osato

  • @Trevor Gokey

Goals

  •  

Discussion topics

Recording: Video Conferencing, Web Conferencing, Webinars, Screen Sharing
Passcode: E+3mAmF5

Item

Presenter

Notes

Item

Presenter

Notes

mixture properties in water mixtures

Barara Morales

  • MS: regressing to previous infrastructure has been problematic in running SFEs

  • Looking at Hmix, density for 7 water models, wants to look at SFE but infra issues

  • Binary mixtures with water, using Sage 2.1

  • TIP3P “best” water model for alcohols, but still high error

  • TIP4P/TIP3P best for amines and both alc/amine

  • DM: Looks like confidence intervals overlap for different RMSE’s, seems hard to tell which is “best” based on RMSE [MS/BM agree]

    • MS: The errors are bootstrapped over molecule

  • DM: May find that another analysis would be helpful, maybe paired t-test, to see statistics on by-molecule basis rather than aggregating over whole dataset [recording around 10:30--not confident I captured this correctly]

    • CC--need a multiple hypothesis correction for paired t test of all pairs

  • Want to re-train LJ with TIP3P_FB and OPC3 or maybe 4-point model, and check OpenFF 1.0

  • JW: I would think OPC would be better than TIP4P, but you see OPC is the worst--is that expected?

    • MS: a bit of a surprise. TIP4P_FB and OPC have similar properties in pure water, but OPC is bad on mixtures

  • DM: why is r2 negative?

    • BM: happens when correlation is really bad, using scikit, docs just said it represents very bad correlation [r2_score ]

    • DM: thought this would be pearson R, which should always be positive, do you know which r measure it is?

    • BM: I’ll look into it

    • MS: May be worth doing the fit with your own code

    • DM: looks like this is a coefficient of determination

  • MS: reason we want to try Parsley 1.3 is because it’s before we re-trained our LJ parameters with TIP3P, so other models might be better

  • BS: could also train charges

    • CC: BCC in AM1BCC could be trained

    • MS: maybe after we do this, would want to co-tune it with LJ

  • MS: also want to try SFE’s with nonpolar molecules

  • MS: goal of this is: should we change our water model?

    • If TIP3P FB and TIP4P are already close to TIP3P, maybe if we reopt LJ for these models, they’d perform even better

    • MS: doesn’t look like 4 point models are better

    • BS: depends what you want--TIP3P doesn’t have some properties

    • MS: yes, TIP4P is more close to real water. TIP3P FB and OPC3 are closer to 4-point model performance than TIP3P

    • BS: TIP4P doesn’t give right density using Ewald methodologies

    • MS: some of that is corrected in TIP3P FB and OPC3

  • JW: I see switching the water model as something we want to do, but only once. Would make sense to align timeline-wise with release of protein FF. Additionally, could think about making our own water model, I think the offxml environment is so alien to people that they won’t care if it’s an existing water model or not

    • MS: do we want to have an intermediate where we recommend an existing water model?

    • DM: don’t think it matters, as long as it’s good. Don’t think people will refuse to use our water model

    • JW: Disagree, I think it’s much easier to release it as one release rather than going through intermediates

    • DM: sure, I mostly meant I don’t think it matters if we also support another water model. People will use whatever we say to use

  • CC: Some molecules will exist as charged species in water (eg primary amines), are you doing anything to account for that

    • BM: no, should look at that

    • CC: I don’t think we did either for Sage, kind of a gap

    • MS: Need to look at pKa and see if it’s near pH 7

    • CC: I think for primary amines, they will be

    • MS: Not just pKa but how does it change as a result of composition

    • BS: have some notes about treating this, who should i send it to

    • MS: either put in slack for everyone or email to me and I’ll forward it

    • MS: Is there a good pKa predictor for small molecules, or do you have to do QM?

    • BS: probably has been measured

    • DM: if not, pKa prediction is really hard

    • PB: qupKake could work QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-pKa Predictions

Slow diffusion in lipid simulations

Julianne Hoeflich

  • Overall slow diffusion in lipid simulations, much slower lateral diffusion than MacRog and Slipids

  • Think it’s due to alkane tail behavior

  • Lipid tails are 6-18 C

  • Neither Slipids nor MacRog uses HMR

  • Calculate D from simulations; Sage 2.1, HMR is slower than non-HMR but both are slower than expt

  • JW: to be clear, even with small tail length, still have head groups with ~10 heavy atoms?

    • MS/JH: no, we’re just looking at alkane tails

  • MS: HMR is reducing diffusion constant, COM not affected by HMR but dynamics of things twisting/rotating are affected/slowed down due to moving moment of inertia

  • JH: Amber’s most recent lipid FF mentions they have to fine tune C-C-C angle for alkanes, which drastically affected lipid diffusion, after tuning the angle they re-trained torsions which helped a lot

  • D underestimated worse as chain length grows; up to 20% of the diffusion constant for 15 C

    • MS: expect this if barrier is too high

  • Density is pretty accurate

  • TIP3P has results you’d expect for D and density, suggesting it’s not the problem

    • BS: you left out TIP3P D, it’s 3, you predict 6…

    • JH: yeah, it’s true. but we’re looking at alkanes for now, shouldn’t affect it too much

  • Diffusion does not always increase with box size as it would be expected to do, not sure if that’s OK

  • Next steps:

    • re-fit angles/torsions for CCC, then re-run and see if it increases D

    • maybe use QM data or expand dataset, existing torsions aren’t really trained to linear alkanes

      • BW: expecting ~1 week for new dataset

  • TG: If you’re going to do angles, I’d suggest splitting C-C-C vs C-C-H. Currently combined

    • JH: why would those be together…?

    • TG: doesn’t look super different/worth splitting, but I’ve found it’s important

    • BW: I think we tried splitting this and didn’t see much effect?
      LM: Maybe didn’t affect RMSD/ddE but would affect other things?

One torsion shape

BW

  • MS: if angles are so dominant, does it mean it’s not properly minimized?

  • JW: usually high angle/vdW would mean it’s a sterics clash

  • SMIRKS string is [#6X3:1]=[#7X2,#7X3+1:2]-[#6X4:3]-[#6X3,#6X4:4], C-NX3-C-C

  • BS: are you sure dark blue dotted line is angle and not vdW 1-4?

    • BW: not 100% sure, but pretty sure

  •  

Action items

Decisions