DM – Datasets to remove – Will want to remove fragmenter and theory benchmarks. We DO want to include SiliconTX set.
CB – On slide 4, the pink points make it clear that some parameter splitting will be needed. Specifically t69. Currently it mixed ureas (where N is bound to O) and amidea (where N isn’t bound to O).
HJ – I wonder if this can be resolved with new torsion parameters I came up with for the amide issue. We came up with a child parameter of t69.
DM – I still think we should use CB’s original suggestion, but we could compare those to HJ’s/openff-1.3.0’s SMARTS
CB – In Rowley set in particular, we’re likely to see things like bonds coming off imidazole N’s. The current form of t69 will group this with amides, and that’s clearly a bad idea.
LPW – On R^2 table, is there anything that the molecules in the bottom right share?
JM – I’d like to look into this
TG – The torsiondrives in the 80-100 range maybe due to faulty data (like, driving torsions inside of a ring)
DM – We looked at these in particular and they are valid.
First few rows/columns are stiffer degrees of freedom(bonds and angles) and show the largest magnitudes.
Target weights for different target types
CB – With candidate weights, why not use 1/median?
HJ – That’s a good idea. My weight candidates are just intended to be rough guesses. I agree that being more precise would be good
TG – Since we have so many optimizations, I wonder if their contributions will always dominate
HJ – Agreed. I’d kinda like to have a way to apply similar weights to each TARGET, rather than target category.
LPW – I think that changing the distribution of targets COULD change the output of the fit. Though I hope that we can generate roughly equal amounts of data for each target type. However, since TDs are so expensive, we can’t expect that to be the case.
DM – Now that lots of people are trained to fit FFs, we should be able to explore these questions.
CB – I see a tension between the academic interest to get the method correct, and the industry perspective of “getting better binding free energies”. In the hessian, we see a dominance of contributions from bonds and angles. However, these won’t make a big difference in binding FEs. So I’m inclined to think that we want TDs to be VERY heavily weighted.
Amide update
DM – Are we ready to release?
HJ – We’ve changed some prior widths, which may have far-reaching effects, so I’d like to do a bit more testing.
Hyesu Jang will send a preliminary amide fix file to Arjun Naranyan
20
Standard benchmarks
Jeff and Josh
Which metrics?
benchmarkff dataset
Hyesu’s pre-release benchmarks
Which data and software?
Prepare a clone of QCA data for local running/comparison (benchmarkff)
Make sure everyone can run Hyesu’s pre-release benchmarks
Short-term solution vs. long-term solution?
Short term: Exactly the dataset/software used in existing studies
Long term: Carefully curated benchmark set, potentially hidden from team. Rules about which overlapping data can be used during training.
Could have tool to compare degree of overlap
Action items
Decisions
No labels
0 Comments
You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.
0 Comments