2020-10-15 Force Field Release meeting notes

Date

Oct 15, 2020

Participants

  • @Hyesu Jang

  • @Jessica Maat (Deactivated)

  • @Pavan Behara

  • @Jeffrey Wagner

  • @Lee-Ping Wang

  • @David Mobley

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

10

WBO interpolated parameters update

Jessica &
Pavan

  • DM – Datasets to remove – Will want to remove fragmenter and theory benchmarks. We DO want to include SiliconTX set.

  • CB – On slide 4, the pink points make it clear that some parameter splitting will be needed. Specifically t69. Currently it mixed ureas (where N is bound to O) and amidea (where N isn’t bound to O).

    • HJ – I wonder if this can be resolved with new torsion parameters I came up with for the amide issue. We came up with a child parameter of t69.

    • DM – I still think we should use CB’s original suggestion, but we could compare those to HJ’s/openff-1.3.0’s SMARTS

    • CB – In Rowley set in particular, we’re likely to see things like bonds coming off imidazole N’s. The current form of t69 will group this with amides, and that’s clearly a bad idea.

  • LPW – On R^2 table, is there anything that the molecules in the bottom right share?

    • JM – I’d like to look into this

  • TG – The torsiondrives in the 80-100 range maybe due to faulty data (like, driving torsions inside of a ring)

    • DM – We looked at these in particular and they are valid.

10

amide update and ic hessian fitting

@Hyesu Jang

IC hessian implementation in ForceBalance

  • First few rows/columns are stiffer degrees of freedom(bonds and angles) and show the largest magnitudes.

Target weights for different target types

  • CB – With candidate weights, why not use 1/median?

    • HJ – That’s a good idea. My weight candidates are just intended to be rough guesses. I agree that being more precise would be good

  • TG – Since we have so many optimizations, I wonder if their contributions will always dominate

    • HJ – Agreed. I’d kinda like to have a way to apply similar weights to each TARGET, rather than target category.

    • LPW – I think that changing the distribution of targets COULD change the output of the fit. Though I hope that we can generate roughly equal amounts of data for each target type. However, since TDs are so expensive, we can’t expect that to be the case.

    • DM – Now that lots of people are trained to fit FFs, we should be able to explore these questions.

    • CB – I see a tension between the academic interest to get the method correct, and the industry perspective of “getting better binding free energies”. In the hessian, we see a dominance of contributions from bonds and angles. However, these won’t make a big difference in binding FEs. So I’m inclined to think that we want TDs to be VERY heavily weighted.

Amide update

  • DM – Are we ready to release?

    • HJ – We’ve changed some prior widths, which may have far-reaching effects, so I’d like to do a bit more testing.

    • @Hyesu Jang will send a preliminary amide fix file to Arjun Naranyan

20

Standard benchmarks

Jeff and Josh

  • Which metrics?

    • benchmarkff dataset

    • Hyesu’s pre-release benchmarks

  • Which data and software?

    • Prepare a clone of QCA data for local running/comparison (benchmarkff)

    • Make sure everyone can run Hyesu’s pre-release benchmarks

  • Short-term solution vs. long-term solution?

    • Short term: Exactly the dataset/software used in existing studies

    • Long term: Carefully curated benchmark set, potentially hidden from team. Rules about which overlapping data can be used during training.

      • Could have tool to compare degree of overlap

 

 

 

 

Action items

Decisions