2024-06-27 Protein FF meeting note

Participants

  • @Pavan Behara

  • @Anika Friedman

  • @Chapin Cavender

  • @Michael Gilson

  • @Jeffrey Wagner

  • @David Mobley

  • @Brent Westbrook (Unlicensed)

  • Louis Smith

Goals

  • Update on QM fits with pairwise energy differences

  • Reweighting existing benchmark trajectories to quickly score FFs

Recording

2024-06-27-biopolymer-ff-meeting.mp4

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

QM fits with pairwise energy differences

 

@Chapin Cavender

  • Merge the pairwise energy differences from ff14sb and the sum or square errors from the ForceBalance method

  • CC determined that the bad step issue was due to the hessian being calculated incorrectly which has now been fixed

  • CC: pairwise energy differences lead to the bb torsions looking worse relative to the previous method

  • LS: The difference between Null-0.0.3 NAGL and Null 0.0.3 Pair the lines show the penalty function before and after optimization, correct?

  • CC: Ye, correct

  • LS: You figure you are about half way there with the objective function?

  • CC: The new function may take more steps to converge so not necessarily half way there.

  • CC: I don't think you can compare the numbers between the two objective functions, but the total fraction taken up by that category is meaningful

  • Louis: Do you usually report the fractional changes?

  • CC: If you were trying to ask which fit was doing better in a particular category you would divide the number by the total for that category

Quickly scoring FFs by reweighting

@Chapin Cavender

  • CC: One of the hurtles in this project is that it takes a while to get feedback on whether or not that has fixed our modelling of folded structures. DM suggested a quicker way to measure how improved the FF is since the quicker benchmarks are not reflecting the performance of a FF on folded proteins

  • CC: Still use a chi squared value as the metric of the re-weighted FF

  • CC: Neff is close to 1 when all the weight is distributed to an individual frame, if the frames are equally weighted Neff will be close to the number of frames in the trajectory.

  • CC: Validated that if we rescore an identical FF the Neff is equal to the number of samples and chi2 ~1

  • CC: ff14sb as reference and re-weight to candidate FF leads to a low number of effective samples and chi squared gets much worse and gives predicted chi squared close to the real benchmarks

  • CC: re-weight from Null-0.0.3 to ff14sb does not give accurate

  • MG: Why are there 3 columns under Neff

  • CC: That’s the # of samples for each replica and the chi squared is an average

  • MG: You said there is a true chi squared here?

  • CC: Gray is re-weighting to the same FF and those values agree with previous

  • MG: There is a chi squared from doing two separate simulations?

  • CC: All of the chi squared are scoring the agreement after re-weighting

  • MG: Is it true that these chi squared agree to a reference

  • CC: The gray line match well to doing chi squared from

  • MG: That’s a chi squared of what relative to what?

  • CC: This is all with reference to NMR agreement

  • MG: It looks like good agreement for re-weighting for Null-0.0.3 despite there being so few samples

  • DM: It’s good and bad news since we get good agreement, but there are so few samples

  • CC: When we scale up to GBS there are lots of NaNs

  • DM: You may want to generate some samples where you restrain to the state you want to be in

  • Louis: The idea would be this is a sampling problem so you need to enhance sampling to get good coverage of phase space. If you at least cover the relevant area this should work. The low number of samples says there is something in the internal dof that’s limiting the samples. You could do something like umbrella sampling to make it fold (some go like model) with sage consistent bonded parameters (prevent high energies from) and re-weight additional FF from that. You in principal know the difference between the folded and unfolded states.

  • CC: If we want to make this meaningful we need to do a structure restrained simulation

  • DM: There are hints even if the sampling is bad that this could tell if other FF would be bad.

  • LS: Can you do this the dumb way? Just really overdid it on the Ala5 and got really long trajectories and then you can do a down sampling strategy. Whether doing a naïve approach to increase sampling would change. If you can’t do that then excluded volume must be really serious.

  • CC: You are basically saying run a high temperature simulation for several us

  • LS: or do TREX. Get 10x 20x the sampling you need and see if that helps the re-weighting. If it doesn’t then you are running into fundamental issues with the internal coordinates

  • CC: Wouldn’t doing restraints to the native structure of GB3 do the same thing but be more precise

  • Louis: Doing the approach with the unfolding FF or GB3 would be my priority. If you do something that enhances the sampling then there won't be a conversation with the reviewer on whether the trajectories are long enough.

  • MG: In the end we will simulate with the final FF. This might be silly, but can you lump together trajectories from different FFs and reweight from all of them?

  • Louis: I like this idea because it would nicely solve the problem, but because you don’t know how to weight the samples from the two FF relative to one another. I’m not sure of other ways to connect the two. It seems tricky. I was thinking of making a MSM, but that usually only works in theory.

  • MG: How long are these?

  • CC: 500ns for short peptides

  • MG: Two ways to extend the effective samples are extend the run or pull more samples from existing runs

  • Louis: A TREX simulations may answer the question as to whether the number of effective samples is attributed to internal coordinates. Like differences in equilibrium bond lengths.

  • MG: There aren’t going to be many frames where you happen to be at te equilibrium value

  • Louis: Doing it at hotter temperatures is probably the naïve way to do this

  • DM: I think the best thing to do is the restraint thing because I don’t know how well higher temperatures will work. Stitching together trajectories is guarantee to be broken in some way, but it would be hard to know how it’s broken

  • Louis: the enforced folding is likely the best engineering solution

  • MG: What are those restraints going to be?

  • CC: Use a go-like model or use a RC or deviation from the native structure

  • DM: Take the null FF which doesn’t fold and force it to fold and let the small details of the FF settle

  • MG: If you re-weight A to B with different bond lengths then there will still be low samples

  • CC: We don’t compare between ff14sb and null but null to other candidates

  • Louis: What weights can I do that give me the best agreement to experiment? I assume that space is pretty degenerate. Maybe that’s not interesting, but it could help you understand if you did a good job with sample generation and lets you know the best you could do with the raw data you have. If you know what the weights are then this would give you a ceiling for improvement given current sampling. This is something you could do post-hoc.

  • CC: I did something similar when tweaking bb torsions to short peptides.

  • CC: Consensus says the next step is working on enforced folding method

  • DM: How long did re-weighting take?

  • CC: 5-10 minutes for peptides and ~1 hour for GB3

  • Louis: Can you pilot the enforced folding on Ala5?

  • CC: Folding to what? I will focus getting GB3 to work since that is the ultimate goal.

  • CC: Structure based simulation with unfolding FF for basis for re-weighting and once the new objective function optimization converges we can move on to trying to use the specific model or swapping in ambers non-bonded parameters with the pair-wise energy fits that were used to train ff14sb.

Action items

Decisions