2021-04-23 WBO/Impropers meeting notes

Date

Apr 23, 2021

Participants

  • @Pavan Behara

  • @Jessica Maat (Deactivated)

  • @Simon Boothroyd

  • @Trevor Gokey

  • @Christopher Bayly

  • @David Mobley

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

Status of work

@Pavan Behara

PB: I don’t have much to share today, I am still strategizing on QM theory benchmark work, HJ submitted a dataset with B3LYP-D3BJ and different basis sets, I am going through papers and we may need to include some more functionals, will try to summarize by next week on the path forward.

DLM: Okay, sounds good.

 

Status of work

@Jessica Maat (Deactivated)

JM: I am still working on the impropers and will update next week. I encountered some issues with toolkit and will post on slack for assistance from devs.
CB: It would be nice to have an extended version of SB’s plotmol with 3D visualization where one axis is sum of valence angles, other ave of impropers and the third with WBOs, for the minima. As we pick up molecules along the diagonal of the cube we hope to see a spectrum of planarity and the interdependence of parameters can be captured in a nice way. We will uncover any shortcomings with AM1 calculated WBOs too, when compared to QM WBOs.

SB: Yeah, I can put it together for the next meeting.

DLM: We already have QM WBOs in QCA, it would be good if we can do a benchmark plot like JM supervising undergrad did for OE vs AT wbos. PB/TG can help pulling out those values if needed.

JM: Yeah, sounds good, I will add that to the undergrad’s tasks.



Sulfonamide issue, and the 1.3.1a FF patch

@Christopher Bayly

CB: New 1.3.1a FF patch with the rolled back parameters corrects the two values, we have nice tetrahedrals now, I would suggest we would give this to users. I communicated with Gaetano, SB, and JW about the results.

In terms of our hopes that the fix in sulfonamide would improve the protein-ligand benchmarking P38 and PTB datasets, the operation was success, but our non-equilibrium calcs are not that great. There is still some more science to it but I am happy with the first-aid, it works.

SB: Thanks for update, it’s really helpful, I would do some QM benchmarks and release it.

CB: This is impressive, we are living a dream where within a week we fixed a FF issue.

SB/DLM: Yeah, really impressive turnaround time.

CB: So, the deeper issue that still stays is that the when there are intraendocyclic bonds, a four membered ring with 90 bond angle, no matter what would the angle be it shouldn’t throw off the parameterization. Whatever is the right angle for the sulfonamide without throwing off the equilibrium angle should have been captured in the FF.

SB: I would like to go through the training sets and see what molecules are contributing to this.

CB: I think in the infrastrucutre itself somehow we set this up.

SB: I will coordinate with JW on this. I hope this wasn’t anything like the intramolecular hydrogen bonds issue with a wrong sigma value.

CB: Yeah,we have endocyclic angles frequently occurring and the endocyclic angles tell forcebalnce to make a small equilibrium angle, whereas the right equilibrium angle should be tetrahedralish, we call it strain but the equliburium angle shouldn’t pick the strain and be 90.

SB: I wonder as a diagnostic path forward we should pull out individual molecules and do a fit and see whether it is scientific or software issue, any volunteers?

PB: Yeah, I can do that.

CB: Hypervalent sulfur, S with 4 connections, and S with three connections and a charge, etc. An additional experiment is to put one four membered ring.
SB: I think we should investigate a debug dataset containing very simply functionalized molecules that would show us these obscure pathologies that we don’t usually capture in TFD/RMSD statistics.

CB: My second AM1-BCC paper in 2004 has some oversimplified sets. You also came up with lego-ing substituents which is a good set.

SB: Yeah I was looking at that

DLM: Yeah, I agree. We started with the pharma datasets that have a lot of complexity.

SB: We can just add the canary tests and that would be a really valuable check.

DLM: Yeah, they would be so important

TG: Also, the change in priors can be another issue, may be if we retrain 1.2.0 with different priors we may not see these deviations.

CB: I think the first-aid worked, and when we look at the other causes we have knobs that we can turn like composition of datasets , and another knob is priors, I think we should look at the systemic issue rather than changing some settings, we should analyze the failure and find out other areas where the same thing can occur.

SB: I agree. We should really understand the deficiencies instead of patch ups and how we failed benchmarking this. As I said before, we should move beyond TFD and RMSD that obscure these cases.

DLM: Yeah, I completely agree.

 

Hierarchy maps for torions

@Trevor Gokey

TG: I have been talking about the hiearachy problem in torsions and I created this dashboard where I am doing the fourier series on the data I have, and if we have a periodicity aligned with the data well we will see good signal otherwise we can figure out whether to split or not.

CB: Wouldn’t a periodicity of 4 would match to other peaks on graph one lower left?

TG: The graph on the right shows whether picking a periodicity of 4 would be better or not, the graph on the left is driving this.

CB: Okay, let me rephrase, from the tree graph I can see the hierarchy, and up high in the hieararchy when the term grouped together has too much chemistry and when you create child patterns then the graph on the lower right would become clean producing a graph on lower left which should also be clean.

TG: Yeah, the essential idea is to pull out signals and analyze the shapes of the graph on the right. This is a work in progress and I will show an improved iteration next week in chemper.

CB: That’s great I understand the question you are tackling.

Action items

Decisions