2024-10-03 Protein FF meeting note

Participants

Goals

Update on torsion fits targeting GB3 NMR data

Discussion topics

Item

Presenter

Notes

Water model errors

Pre- 0.16 toolkit didn’t assign alternative water models correctly; used TIP3P LJ params
LS: Is there an older version of TK that wasn’t affected? Is any of your work salvageable? Is all of our work using these models problematic?
- CC: Not sure, think it’s everything
- JW: Older toolkits had different issues, like assigning AM1BCC to water instead of the water model’s charges. Not sure
LS: Have you re-run anything yet, do you know how big of a difference it makes?
- CC: No not yet, was just going to re-run things that are going into the final pub
- MS: Anika can help too

GB3 NMR torsion fits

Chapin Cavender

[Slide 2] MS: How much slower is new MBAR workflow
- CC: Slow enough to make it annoying, changed from minutes to hours to run, but not days
- MS: Could try taking eg every 5th sample if long correlation time, rather than every point. Eg if correlation time includes 100 samples, not getting much extra info from using every sample. Could probably get away with 50 bootstrap samples
LS: How do you think about number of effective samples? Is it actually higher? Seems like it’s just because you’re using more samples?
- CC: Trajectories were 500 ns and I saved it 100 ps, 5000 frame trajectory, wound up with only a few uncorrelated samples. Now, I’m using all the data [? in recording around 10-15 mins]
- LS: Is it possible to come up with a number of effective samples that doesn’t depend on how much subsampling we do?
- MS: If you have a higher number of effective samples, then you have good overlap. But might not be visiting the right places in the distribution, so might be misleading
- CC: Even though raw time series is correlated, you loose a lot of data if you only use uncorrelated points
- MG: Fully sub-sampled data is a conservative estimate, can do better by using all the data
- See more discussion in the recording, about 5 minutes starting around 10 mins in and ending around 15-20 mins
[Slide 3] MG: If denominator in SEM is Nbootstrap, you can drive precision arbitrarily close to zero by doing more bootstraps, which seems wrong/doesn’t say anything about simulation accuracy
- MS: Bootstrapping should be done with uncorrelated samples. You don’t divide by sqrt(N). Take sample STD from bootstrap samples, don’t divide, can’t drive it to 0 that way
- CC: I was dividing by sqrt(N). Doing 100 bootstraps, so errors should be 10x larger than what is shown, so order of 1 kcal/mol (plot shows 0.1 kcal/mol)
[slide 4] DM: Trajectories are systematically different at the end of the simulation? Maybe you should try analyzing the other way, e.g. the last 100, 200, 300… ns. More relaxation means free energy of the left side of the plot is getting lower
- LS: Interesting thing to me is they split off from the same point; basin is well sampled for all but all become differently sampled around 0.9
MG: How are these aligned?
- CC: Each curve’s lowest point is set to 0
LS: What was your initialization protocol?
- CC: Steered MD, started from folded and drive harmonic restraint at constant velocity
- LS: From reweighting perspective, with flatter curve like this (slide 6), a lot of ways for something to not have all native contacts, would be pretty easy for you to have not sampled the diversity that much, but may not matter that much either. As long as you have samples in native section, that’s actually what matters for calculating chi
Slide 8, uncertainties should be x10
- LS: Did you bootstrap delta or add in quadrature?
- CC: quadrature
- MS: Expect on average to be ~1 STD away, would be shocking if these uncertainties were 10x higher. Just happened to be very close? Presented level of noise looks like what I’d expect, 10x would be really high
MG: (slide 8): must have picked step size for reweighting?
- CC: No hyper parameters because we’re going all the way from one FF to another
(slide 12) uncertainties from cross-validation, not bootstrapping
[slide 14] CC: This isn’t using bootstrapping yet
- MS: Not sure you need it here
- MG: I would suggest taking some/all of these, run the simulations, and see how well it agrees
- MS: have you run umbrella samples?
- CC: Yes
- MG: my impression is that these are predictions from re-weighting only?
- CC: No, these are umbrella sampling from actual MD runs
- MG: Can you calculate chi2 and see how well it compares to predictions?
- CC: Yes, I’ll do that
MS: One possibility would be to re-weight from all of them, but would need to eval energies
- MG: Reason for study was the figure out alpha value for moving forward, not to use several
- LS: Have you run simulations with any alpha values that you aren’t showing here?
  - CC: No
- LS: agree yellow one looks most promising
- MG: Interesting that it’s not monotonic in alpha, would like to see chi2 to see if it’s random
- LS: all simulations are developing a min in the right place, maybe it just isn’t super sensitive
- MG: If you wanted to, could look at k vectors, but not sure what you’d do with that info
- MS: Advantage of exploring multiple are that they may be sampling different parts of phase space, so you could see that
MS: Both yellow and blue look nice
MG: Could normalize k vectors and take dot products, see how close to 1
CC: Do we think it’s worth it to use multiple alpha values?
- MS: If we eventually want to understand the set of protein FFs that are consistent with NMR structures, then yes, but if it’d slow down the process, just pick 1
- MG: For a given alpha, set of k’s is deterministic. But landscape may be flat
  - CC, yes, given a starting value and alpha value
- MS: Could imagine a study looking at all FFs that are consistent with folded structures, and how it affects k. But not our first paper.
LS: reason to look at that would be if we’re in a local min in k space, that’s hard to get out of

Participants

Goals

Discussion topics

Action items

Decisions

0 Comments