2024-06-27 Protein FF meeting note

Participants

Goals

Update on QM fits with pairwise energy differences
Reweighting existing benchmark trajectories to quickly score FFs

Discussion topics

Item

Presenter

Notes

QM fits with pairwise energy differences

Chapin Cavender

Merge the pairwise energy differences from ff14sb and the sum or square errors from the ForceBalance method
CC determined that the bad step issue was due to the hessian being calculated incorrectly which has now been fixed
CC: pairwise energy differences lead to the bb torsions looking worse relative to the previous method
LS: The difference between Null-0.0.3 NAGL and Null 0.0.3 Pair the lines show the penalty function before and after optimization, correct?
CC: Ye, correct
LS: You figure you are about half way there with the objective function?
CC: The new function may take more steps to converge so not necessarily half way there.
CC: I don't think you can compare the numbers between the two objective functions, but the total fraction taken up by that category is meaningful
Louis: Do you usually report the fractional changes?
CC: If you were trying to ask which fit was doing better in a particular category you would divide the number by the total for that category

Quickly scoring FFs by reweighting

Chapin Cavender

CC: One of the hurtles in this project is that it takes a while to get feedback on whether or not that has fixed our modelling of folded structures. DM suggested a quicker way to measure how improved the FF is since the quicker benchmarks are not reflecting the performance of a FF on folded proteins
CC: Still use a chi squared value as the metric of the re-weighted FF
CC: Neff is close to 1 when all the weight is distributed to an individual frame, if the frames are equally weighted Neff will be close to the number of frames in the trajectory.
CC: Validated that if we rescore an identical FF the Neff is equal to the number of samples and chi2 ~1
CC: ff14sb as reference and re-weight to candidate FF leads to a low number of effective samples and chi squared gets much worse and gives predicted chi squared close to the real benchmarks
CC: re-weight from Null-0.0.3 to ff14sb does not give accurate
MG: Why are there 3 columns under Neff
CC: That’s the # of samples for each replica and the chi squared is an average
MG: You said there is a true chi squared here?
CC: Gray is re-weighting to the same FF and those values agree with previous
MG: There is a chi squared from doing two separate simulations?
CC: All of the chi squared are scoring the agreement after re-weighting
MG: Is it true that these chi squared agree to a reference
CC: The gray line match well to doing chi squared from
MG: That’s a chi squared of what relative to what?
CC: This is all with reference to NMR agreement
MG: It looks like good agreement for re-weighting for Null-0.0.3 despite there being so few samples
DM: It’s good and bad news since we get good agreement, but there are so few samples
CC: When we scale up to GBS there are lots of NaNs
DM: You may want to generate some samples where you restrain to the state you want to be in
Louis: The idea would be this is a sampling problem so you need to enhance sampling to get good coverage of phase space. If you at least cover the relevant area this should work. The low number of samples says there is something in the internal dof that’s limiting the samples. You could do something like umbrella sampling to make it fold (some go like model) with sage consistent bonded parameters (prevent high energies from) and re-weight additional FF from that. You in principal know the difference between the folded and unfolded states.
CC: If we want to make this meaningful we need to do a structure restrained simulation
DM: There are hints even if the sampling is bad that this could tell if other FF would be bad.
LS: Can you do this the dumb way? Just really overdid it on the Ala5 and got really long trajectories and then you can do a down sampling strategy. Whether doing a naïve approach to increase sampling would change. If you can’t do that then excluded volume must be really serious.
CC: You are basically saying run a high temperature simulation for several us
LS: or do TREX. Get 10x 20x the sampling you need and see if that helps the re-weighting. If it doesn’t then you are running into fundamental issues with the internal coordinates
CC: Wouldn’t doing restraints to the native structure of GB3 do the same thing but be more precise
Louis: Doing the approach with the unfolding FF or GB3 would be my priority. If you do something that enhances the sampling then there won't be a conversation with the reviewer on whether the trajectories are long enough.
MG: In the end we will simulate with the final FF. This might be silly, but can you lump together trajectories from different FFs and reweight from all of them?
Louis: I like this idea because it would nicely solve the problem, but because you don’t know how to weight the samples from the two FF relative to one another. I’m not sure of other ways to connect the two. It seems tricky. I was thinking of making a MSM, but that usually only works in theory.
MG: How long are these?
CC: 500ns for short peptides
MG: Two ways to extend the effective samples are extend the run or pull more samples from existing runs
Louis: A TREX simulations may answer the question as to whether the number of effective samples is attributed to internal coordinates. Like differences in equilibrium bond lengths.
MG: There aren’t going to be many frames where you happen to be at te equilibrium value
Louis: Doing it at hotter temperatures is probably the naïve way to do this
DM: I think the best thing to do is the restraint thing because I don’t know how well higher temperatures will work. Stitching together trajectories is guarantee to be broken in some way, but it would be hard to know how it’s broken
Louis: the enforced folding is likely the best engineering solution
MG: What are those restraints going to be?
CC: Use a go-like model or use a RC or deviation from the native structure
DM: Take the null FF which doesn’t fold and force it to fold and let the small details of the FF settle
MG: If you re-weight A to B with different bond lengths then there will still be low samples
CC: We don’t compare between ff14sb and null but null to other candidates
Louis: What weights can I do that give me the best agreement to experiment? I assume that space is pretty degenerate. Maybe that’s not interesting, but it could help you understand if you did a good job with sample generation and lets you know the best you could do with the raw data you have. If you know what the weights are then this would give you a ceiling for improvement given current sampling. This is something you could do post-hoc.
CC: I did something similar when tweaking bb torsions to short peptides.
CC: Consensus says the next step is working on enforced folding method
DM: How long did re-weighting take?
CC: 5-10 minutes for peptides and ~1 hour for GB3
Louis: Can you pilot the enforced folding on Ala5?
CC: Folding to what? I will focus getting GB3 to work since that is the ultimate goal.
CC: Structure based simulation with unfolding FF for basis for re-weighting and once the new objective function optimization converges we can move on to trying to use the specific model or swapping in ambers non-bonded parameters with the pair-wise energy fits that were used to train ff14sb.

Participants

Goals

Discussion topics

Action items

Decisions