2023-10-19 Protein FF meeting note

Participants

  • @Anika Friedman

  • @Brent Westbrook (Unlicensed)

  • @David Mobley

  • @Alexandra McIsaac

  • @Michael Gilson

  • @Michael Shirts

  • @Pavan Behara

  • @Chapin Cavender

Goals

  • Update on NMR reweighting fit

  • Benchmarking espaloma on peptide NMR data

Recording

Recording here: https://drive.google.com/file/d/1XEyldKZha_V3KNwrJC5k5bkl6f_v3aJi/view?usp=share_link

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

NMR reweighting fit

 

@Chapin Cavender

  • Slides will be uploaded

  • (slide 40)

    • MG: are these chi squareds from reweighting or actual resimulation?

      • CC: reweighting, so we would need to re-simulate for proper evaluation

    • MG: what’s the transferability here, since in all cases the test cases are higher than the training data?

      • CC: since all the peptides are now in the training set and not the test set, I’m not sure I can answer that

      • CC: at least from reweighting on GB3, training on the short peptides doesn’t seem to improve performance, but again we would need to re-sample to really see this

    • CC: IMO adding in the longer peptides introduces more variability in the FF parameters we need to agree with experiments, as indicated from the wider spread in alpha

  • (slide 41)

    • CC: The resampling is of the peptides in training

    • CC: my takeaway here is that training FFs with the larger peptides is giving better results, so I’m running larger benchmarks with these FFs. However, globally, it seems like it’s not performing the way we want it to. This could be due to two things:

      • 1) either the simulations aren’t converged wrt observables and we’re sampling a bit randomly. I don’t think it’s this as dihedral populations seem to stabilize and converge

      • 2) parameters are moving too much to be useful here

        • MG: my impression here was that changes in the k values are really small?

        • MG: could be worth looking at torsional profiles, seemingly small values of k seem to be leading you out of the zone where resampling works

  • MS: when you did the resampling, did you look at statistics like effective number of samples?

    • CC: just looked at a timeseries by eye to see that it plateaus, have not applied more rigorous measures of convergence. Should I?

    • DLM: could be useful – are you sure you’re not just looking at noise? How much of your stats are being impacted just by fluctuations? Ruling this out would be a reason to evaluate convergence more rigorously

    • CC: is looking at a timeseries by eye enough?

    • MG: could there be a bug in the reweighting code?

    • CC: that’s a possibility. I validated that I get the same chi squared values that I do from estimating outside the reweighting code

    • CC: my reweighting code has to read in the dihedrals and recompute observables for loss function. I’ve tested that the loss functions give same values as outside the reweighting code. I’m confident the gradient is going in the right direction.

    • MG: suggests experiment of manually changing k value of one torsion by hand, re-running the reweighting code, … (recording ~25 min)

      • CC: I can look into that

  • CC: couple other ideas

    • Could keep iterating over resampled trajectories until parameters stop changing. Could be slow

    • For flat CV curves (e.g. slide 39), could try larger alphas so that we enforce small parameter changes on the order of less than a 0.1 kcal/mol.

    • MG: seems like a combination of the two, multiple little steps might take it in the right direction

    • CC: could also try just training to 5-mer, given that that can actually form a helical bond

      • CC: would take little human time to try this

      • LW: sounds like a good direction

    • MG: likes both ideas

  • MS: chi-squareds on test set (slide 40) seem very large and problematic

    • MS: worth looking at number of effective samples, there’s a tool in MBAR that does that

    • CC: sure, should be easy to do

    • MS: you’d like the number of effective samples to be within an order of magnitude of your actual samples

    •  

Espaloma benchmark

@Chapin Cavender

  • Slides will be uploaded

  • MG: what’s the charge model?

    • CC: Espaloma’s, which is fit to AM1-BCC

  • CC: worth running multiple water models on the grant?

    • MS: would be interesting to see it compared to OPC3 and TIP3P-FB

    • CC: can do

Action items

Decisions