2021-01-19 WBO/Impropers meeting notes

Participants

@Simon Boothroyd

@Jessica Maat (Deactivated)

@Trevor Gokey

@David Mobley

@Pavan Behara

Discussion

JM: I am almost done with generating the datasets, hitting some troubles with matching the torsions I want to highlight, debugging the smarts torsion match now. Also, regarding the other work on expanding kval plots some molecules are failing in the datasets.
TG: One thing that comes to my mind is when you do chemical environment matches you have to add the colon and number tag on your smarts pattern, that might be the issue here, we can look into that.

DLM: Yeah, TG can help sorting it out, what’s the issue with failing molecules?

TG: It is quite possible to get into trouble with some molecules, a try, except block would be great so you can skip to the next molecule if something fails.

PB: Link for slides

Last week we talked about how the change in priors would affect the parameter values, here for a simple fit with only one interpolated parameter there is not much change in the parameter and objective function values.

DLM: So, the starting point for these parameters is?

PB: This is a combined parameter of t43, 44, 45, I took t43 k-value as k1_bondorder1 and t45 k-value as k1_bondorder2,

`starting point

k1_bondorder1="1.048715180139e+00 * mole**-1 * kilocalorie" k1_bondorder2="5.202617258558e+00 * mole**-1 * kilocalorie"`

DLM: Okay, that’s why with prior 0.2 the final physical parameter is 5. Yeah, for this fit then priors don’t matter much. You can go for a complete refit then.

PB: Okay, the training set would be gen2?

TG/DLM: Yeah, 1.3.0 training set, you can get the targets from the release tarballs.

PB: For fit7, different priors resulted in higher std. deviation among some of the parameters.

DLM: Okay, so there are other minima that can be reached with flexing the priors, check out the torsion profiles for these and see the differences.

SB: One approach CB uses is checking the molecule populations for the gradients on the parameters, some might push in one direction and others might drag in the opposite direction. I can suggest some ways to look into that.

TG: It would be good to see the differences in residual plots between priors 2 and 10. That would also give some insight, I think forcebalance doesn’t spit out the gradients explicitly on a per-molecule basis, some work might be needed there.

PB: Here are the plots showing the distribution for over-represented parameters in molecules with larger ddE, I am planning to do the same analysis for other metrics RMSD and TFD.

DLM: Yeah, it would be good to see how the trends match with other metrics, whether the same parameter is over-represented.

PB: Here is fit8, after removing the over-represented parameter TIG5a, there is not much improvement in the BM plots, and I think this is not the right way to go about it.

DLM: Yeah, the point in finding the over-represented parameters is not to remove them, making the definition better or splitting them into different chemistries.

PB: Here, on the next slide we can see some of the torsion profiles go back to 1.3.0, and some don’t.

DLM/SB/TG: The molecule on the left (below) has intramolecular hydrogen bond and that’s causing a deeper well in the middle. In MM world usually when there is an interaction with a hydrogen bond we see a huge drop in energy, here it is a bit restrained and it is not as deep as other cases. These sort of intramolecular cases can also be excluded from fitting the torsions to avoid such volatile behavior. We can write a smarts pattern to do that sort of things, LPW was also discussing about this for quite a while. SB can look into this and offer any suggestions later. Also, going forward change the color scheme on the plots to distinguish easily between different lines.

PB: Here are the residuals plotted by subtracting from QM energy the MM_intrinsic energy, that is obtained by fixing the atoms involved in torsion, making the matched torsion parameter zero and locally optimizing.

DLM: This is after the correction, right? Okay, this is what I was expecting to look at, we may still need to work on the definition of residual since we may also want to restrict the movement of other atoms.

PB: Can you quickly check whether the MM_intrinsic calculation is correct or not? I am making the particle mass to be zero so that no forces are applied on them and using the localminimizer in openmm, I saw two examples which do geometry optimization using a simulation object and this function.

TG/SB: Yeah, openmm uses the same thing in both, this looks okay. Btw in forcebalance another constraint of 1kcal/mol is applied on the other atoms too so that the final geometry isn’t far away from the initial geometry, you can also look into that.

DLM: In the final picture the peak in the middle is due to sterics, the methyls on the five membered ring in positions 11, 18 are clashing with the oxygens on the right end

SB: So, remind me again how you are filtering the sterics?

PB: For sterics filtering I am calculating the LJ energy difference between the highest energy conformer and lowest energy conformer, and if it is greater than a threshold of 3.5 kcal/mol, which we set eyeballing the chemical structures that have strong sterics, we are filtering out those molecules.

SB: Okay, that’s good.

TG/PB: So, how to handle these cases in general?

DLM: Sterics may fall into the category of optimizing LJ parameters, it is still a difficult task.

SB: Chris raised this point before that fitting the LJ parameters won’t help but optimizing the 1-4 scale factors, which is out of scope for this work but definitely we should plan in the near future.

Action items

@Jessica Maat (Deactivated) will continue working on dataset generation. TG will help with any debugging.

@Jessica Maat (Deactivated) will touch base with Josh Horton on setting up bespoke runs.

@Pavan Behara will do an all valence parameters refit of simple_fit1.

@Pavan Behara will repeat the over-representation analysis for TFD and RMSD metrics.

@Pavan Behara will compare the residuals with different priors (2, 10) on fit7.

@Pavan Behara will compare the residuals on BM set of molecules that match TIGs.

@Simon Boothroyd will give pointers on analyzing the parameter gradients wrt molecule populations, and smarts pattern to filter out for intramolecular hydrogen bonds.