pending re-benchmarking on small mol confs, was somewhat delayed by substantial recent changes in benchmarking software
PLB benchmarks
MO
(Slides will be uploaded)
Expecting most results to be pretty good except p38, which may be noisy
JW: do many ligands incorporate the major changes in 2.2? e.g. sulfamides, small ring angles
DM: don’t expect a lot of difference, but all parameters changed as it was a full re-fit
LW: basically 2.1 and 2.2 look pretty comparable
JW: cool that p38 outliers look fully like outliers, massive errors and off diagonal
MS: how are the error bounds calculated?
MO: the spread across 3 replicates. Not MBAR error
JW: this was a benchmark to check if binding energies get worse, and the gist is that they don’t? So release “passed” this check?
DM – Yes
NAGLrc2
LW
(General) – peak of charge RMSE indicating difference between AT and OE is interesting. May be due to aggressive tautomer sampling or salt removal in mols taken from SPICE dataset. Either way, this is AmberTools/ATTKW’s problem.
PB – for the previous question about negative charge on nitrogen, I always see a Na+ near it in pubchem molecules, if you search for azanides you can see a few of those
DM – It seems like this is an improvement on AmberTools and we should get this out immediately.
(lots of dsicussion, see recording ~40 mins)
Free energies slide: DM – Why are circles sometimes overlapping end of distribution? Might be worth looking into what’s going on in there
Final slide
JW – Agree with what LW said. Want to have a month or two where we say “nagl will soon be used by default, put this in your code if you want to DISABLE that”
MS – Would be good to put out some sort of paper/presentation about what we’re seeing. And communicate/implement
JW – Re: small mols, maybe we just have a lookup table for everything under ~5 heavy atoms? Or delegate those to AmberTools in the implementation?
PB – This looks production-ready. What are the obstacles for making it the default right now?
LW – We do want to give people a lot of warning before we switch backends on them. Also the benchmarks are finite and so we’d like to give people a chance to report bugs from the wild. Also, we can put in the filter for small mols, but the goal of the project is to make a GNN that can handle small mols.
JW – Agree with all that. And once we have something out that works, we won’t go back to fix stuff for small mols.
DM – Agree with rollout in the order of “possible to install → installed by default → used by default”
PB – LW, did you do benchmarks on GPU performance? Would proteins be faster with GPUs?
LW – I haven’t specifically done that benchmark. I think it’s fastest with DGL installed, and there’s a DGL-CUDA package that is presumably faster. But the inference deployment doesn’t bring in DGL.
MS – Looking at the error vs. mol size graph, it looks like nagl gets better right around 3 or 4 heavy atoms. So it might be enumeratabale.