Updates from team | | DH: failing Sage RC optimizations I looked manually, failing cases had a C#C, C#N, cubane. DD – Did we see the same problem with earlier FFs? DH – No, for the others, less than 1% of optimizations failed. LD – With sage rc1 the failures are around 5%.
DH: also checked successful optimizations, there are also C#N cases; so, not all C#N appear to be failing DD – I’m also curious how the result plots are looking. LD – (shows plots, sage rc1 is better than 1.2.1 in TFD, but worse in RMSD and dE) DH – This comparison may be biased by bin size. It may be good to add standard deviations or something like that. DD: looks like we may have recovered some of the RMSD accuracy we lost between 1.2.1 and 1.3.0; gained quite a bit of accuracy in torsion prediction SB: have to take result with a grain of salt, even if encouraging; hard to tell if we’re getting a few really good cases and a number of slightly worse ones, for example JW – Would be good to have cumulative distribution plots, could use small bins in that case. DH – Good idea. Reviewers on the Lim paper asked for smaller bins as well.
LD – Yesterday, I had a call with DH on refactor plans. There was one cases where a minimum wasn’t “caught”, and it was because we took the intersection and that removed the minimum from consideration. DH – One big update – Vyutas has run the sage rc for relative free energy calcs. I expect the data very soon. SB – DM chatted with Ant Nicholls at OE, and we’ve started a conversation about different ways to do the analysis. He’s backlogged now so we’ll hear from him later. JH – Nothing to report DD – We’re taking a two-pronged approach to the MM on the public spec. One way was to add additional specs on the existing industry dataset. We’ve run into difficulties on this approach because “engineX” needed a larger size limit. Now we’re hitting a new class of error, something deep inside the requests module. I’m trying to figure out why, but I don’t have a good conclusion for this yet. It’s likely a size/quantity limit. Basically, QC* is good at handling small-ish datasets But one thing is that collections need to get shuttled back and forth between client and server, and they grow over time. So now these are really big. The other problem is that there are factors that multiply the size/complexity of the objects, and we either end up with really large objects or tons of requests. So, I’ve been retrying these submissions a bunch of times, and sometimes they go through
SB – So, the goal is to move toward a more paginated post? DD – Yes, so the aim is to be able to apply a spec to an entire collection. The molecules/coordinates are small, there’s no problem uploading them. The goal is to let the server take a spec and apply it to a whole collection. SB – I like fastAPI a lot, would recommend if it will plug in. DD – Agree, Doaa at MolSSI has been poking around with ways to resolve this but MolSSI is developer-time-constrained right now.
DD – So I’m going to be automating attempting this submission into a loop, and hopefully these will go through. Currently some specs are running/complete already. DD – Roughly 3.5% error rate on the burn in set using sage rc1. Checked the method performance on the burn-in set, (shows RMSD, dE, and TFD plots) – No conclusions yet, set isn’t great scientifically.
JW – Worked with DD on release prep, decided to delay pending Swope/Lucas analysis code and rc1 investigation Will start gathering refactor requirements soon
How should we handle the rc1 optimization failures? New FF release or infrastructure changes? Next openff-benchmark release? DD – Details of SB’s locally-run optimizations? SB – Ran on the “industry dataset 1.0”, which had the problematic implicit-H mols from Merck. The error rate didn’t seem hugely problematic. DD – I could try to run this on the current public set with the current benchmarking infrastructure. JW – This may not be necessary if we’ve already observed a similar error rate in the burn-in set. DD – I may try to run the sage rc1 jobs on QCA/locally anyway.
DD – Can we further delay this new analysis from being in a release? JW will email partners to tell them that we’re delaying the release pending further analysis of the release candidate and infrastructure.
|
Sage RC release? | Jeff | JW: assuming Sage RC gives between a 3.5% - 5% error rate, do we want to: JW: changing how we are running optimizations to address Sage RC issues might look like we’re gaming things, open us up to criticism SB: we do need to understand why the failures are occurring; when running the ~60k public molecules myself did not see widespread failure SB: visualization of inidividual cases is next step; see if things are flying off into space, explode, etc. SB: if these are nitriles, don’t think parameters changed much from 1.3 values
JW: do we include the sage RC1 in openff-benchmark release? SB: recommend delay; gather data on what’s happening first given that we are observing problems ourselves JW: agree; happy to send an email to benchmarking partners to this effect [decision] we will delay next release of openff-benchmark SB: believe this is the process working, definitely pleased we’re doing this
|