GB3 NMR torsion fits | Chapin Cavender | [Slide 2] MS: How much slower is new MBAR workflow CC: Slow enough to make it annoying, changed from minutes to hours to run, but not days MS: Could try taking eg every 5th sample if long correlation time, rather than every point. Eg if correlation time includes 100 samples, not getting much extra info from using every sample. Could probably get away with 50 bootstrap samples
LS: How do you think about number of effective samples? Is it actually higher? Seems like it’s just because you’re using more samples? CC: Trajectories were 500 ns and I saved it 100 ps, 5000 frame trajectory, wound up with only a few uncorrelated samples. Now, I’m using all the data [? in recording around 10-15 mins] LS: Is it possible to come up with a number of effective samples that doesn’t depend on how much subsampling we do? MS: If you have a higher number of effective samples, then you have good overlap. But might not be visiting the right places in the distribution, so might be misleading CC: Even though raw time series is correlated, you loose a lot of data if you only use uncorrelated points MG: Fully sub-sampled data is a conservative estimate, can do better by using all the data See more discussion in the recording, about 5 minutes starting around 10 mins in and ending around 15-20 mins
[Slide 3] MG: If denominator in SEM is Nbootstrap, you can drive precision arbitrarily close to zero by doing more bootstraps, which seems wrong/doesn’t say anything about simulation accuracy MS: Bootstrapping should be done with uncorrelated samples. You don’t divide by sqrt(N). Take sample STD from bootstrap samples, don’t divide, can’t drive it to 0 that way CC: I was dividing by sqrt(N). Doing 100 bootstraps, so errors should be 10x larger than what is shown, so order of 1 kcal/mol (plot shows 0.1 kcal/mol)
[slide 4] DM: Trajectories are systematically different at the end of the simulation? Maybe you should try analyzing the other way, e.g. the last 100, 200, 300… ns. More relaxation means free energy of the left side of the plot is getting lower MG: How are these aligned? LS: What was your initialization protocol? CC: Steered MD, started from folded and drive harmonic restraint at constant velocity LS: From reweighting perspective, with flatter curve like this (slide 6), a lot of ways for something to not have all native contacts, would be pretty easy for you to have not sampled the diversity that much, but may not matter that much either. As long as you have samples in native section, that’s actually what matters for calculating chi
Slide 8, uncertainties should be x10 LS: Did you bootstrap delta or add in quadrature? CC: quadrature MS: Expect on average to be ~1 STD away, would be shocking if these uncertainties were 10x higher. Just happened to be very close? Presented level of noise looks like what I’d expect, 10x would be really high
MG: (slide 8): must have picked step size for reweighting? (slide 12) uncertainties from cross-validation, not bootstrapping [slide 14] CC: This isn’t using bootstrapping yet MS: Not sure you need it here MG: I would suggest taking some/all of these, run the simulations, and see how well it agrees MS: have you run umbrella samples? CC: Yes MG: my impression is that these are predictions from re-weighting only? CC: No, these are umbrella sampling from actual MD runs MG: Can you calculate chi2 and see how well it compares to predictions? CC: Yes, I’ll do that
MS: One possibility would be to re-weight from all of them, but would need to eval energies MG: Reason for study was the figure out alpha value for moving forward, not to use several LS: Have you run simulations with any alpha values that you aren’t showing here? LS: agree yellow one looks most promising MG: Interesting that it’s not monotonic in alpha, would like to see chi2 to see if it’s random LS: all simulations are developing a min in the right place, maybe it just isn’t super sensitive MG: If you wanted to, could look at k vectors, but not sure what you’d do with that info MS: Advantage of exploring multiple are that they may be sampling different parts of phase space, so you could see that
MS: Both yellow and blue look nice MG: Could normalize k vectors and take dot products, see how close to 1 CC: Do we think it’s worth it to use multiple alpha values? MS: If we eventually want to understand the set of protein FFs that are consistent with NMR structures, then yes, but if it’d slow down the process, just pick 1 MG: For a given alpha, set of k’s is deterministic. But landscape may be flat MS: Could imagine a study looking at all FFs that are consistent with folded structures, and how it affects k. But not our first paper.
LS: reason to look at that would be if we’re in a local min in k space, that’s hard to get out of
|