Content Comparison

Date

15 Oct 2020

Time

Item

Presenter

Notes

10

WBO interpolated parameters update

Jessica &
Pavan

DM – Datasets to remove – Will want to remove fragmenter and theory benchmarks. We DO want to include SiliconTX set.
CB – On slide 4, the pink points make it clear that some parameter splitting will be needed. Specifically t69. Currently it mixed ureas (where N is bound to O) and amidea (where N isn’t bound to O).
- HJ – I wonder if this can be resolved with new torsion parameters I came up with for the amide issue. We came up with a child parameter of t69.
- DM – I still think we should use CB’s original suggestion, but we could compare those to HJ’s/openff-1.3.0’s SMARTS
- CB – In Rowley set in particular, we’re likely to see things like bonds coming off imidazole N’s. The current form of t69 will group this with amides, and that’s clearly a bad idea.
LPW – On R^2 table, is there anything that the molecules in the bottom right share?
- JM – I’d like to look into this
TG – The torsiondrives in the 80-100 range maybe due to faulty data (like, driving torsions inside of a ring)
- DM – We looked at these in particular and they are valid.

10

amide update and ic hessian fitting

IC hessian implementation in ForceBalance

First few rows/columns are stiffer degrees of freedom(bonds and angles) and show the largest magnitudes.

Target weights for different target types

CB – With candidate weights, why not use 1/median?
- HJ – That’s a good idea. My weight candidates are just intended to be rough guesses. I agree that being more precise would be good
TG – Since we have so many optimizations, I wonder if their contributions will always dominate
- HJ – Agreed. I’d kinda like to have a way to apply similar weights to each TARGET, rather than target category.
- LPW – I think that changing the distribution of targets COULD change the output of the fit. Though I hope that we can generate roughly equal amounts of data for each target type. However, since TDs are so expensive, we can’t expect that to be the case.
- DM – Now that lots of people are trained to fit FFs, we should be able to explore these questions.
- CB – I see a tension between the academic interest to get the method correct, and the industry perspective of “getting better binding free energies”. In the hessian, we see a dominance of contributions from bonds and angles. However, these won’t make a big difference in binding FEs. So I’m inclined to think that we want TDs to be VERY heavily weighted.

Amide update

DM – Are we ready to release?
- HJ – We’ve changed some prior widths, which may have far-reaching effects, so I’d like to do a bit more testing.
- Hyesu Jang will send a preliminary amide fix file to Arjun Naranyan

20

Standard benchmarks

Jeff and Josh

Which metrics?
- benchmarkff dataset
- Hyesu’s pre-release benchmarks
Which data and software?
- Prepare a clone of QCA data for local running/comparison (benchmarkff)
- Make sure everyone can run Hyesu’s pre-release benchmarks
Short-term solution vs. long-term solution?
- Short term: Exactly the dataset/software used in existing studies
- Long term: Carefully curated benchmark set, potentially hidden from team. Rules about which overlapping data can be used during training.
  - Could have tool to compare degree of overlap

10

amide update and ic hessian fitting