Updates from team | LD: Janssen results are in the public drop zone! OPLS outliers, trying to assess what is wrong for these cases Met with DM, DH: decided to do a small benchmark of torsion profiles DM asked whether electron donating groups in aniline was bending the nitrogen (since it will switch between planar and tetrahedral). (shows a case where there’s an amine group connected to a phenyl ring substituted with C(N)=C ). In FF optimization, I saw it become planar with all openffs. But aniline w/ an ortho electron donating group became planar with openffs 1.0-1.2, but it became tetrahedral (10 degrees out of plane) with 1.3 (this may not have been the case, need to re-check data). Using QM, the pure aniline is 15 degrees out of plane, but with a electron donating group it’s 25 degrees. With an electron withdrawing group, it becomes more planar. SB – This is cool, let’s talk about this more in the Friday improper meeting. LD – Do you mean testing the planarity in the benchmarking? SB – want to take common moieties and get an internal coordinate view, like amine planarity, etc.; when we do the benchmarking we get glaringly obvious deficiencies LD – Benchmarking with OPLS4 – Working with DH on new schrodinger branch
DH Adapted schrodinger PR to be able to work with newer schrodinger versions. Also updated documentation, eg to better explain the ffbuilder directory usage. I think it’s ready to be merged.
LD – Should we tell people that there’s a ff-REbuilder tool available? DH – I’ll document that. DD – How do you specify which OPLS version to use? DH – It silently switches depending on which Schrodinger site version you have installed. In many cases you can’t use anything other than what your current schrodinger suite version supports. DD – Thanks, DH. I sent Kaushik the new instructions but haven’t heard from him. DH – I haven’t heard anything either. I emailed him today to say that there’s a new version of the branch available.
DD – Should we write a protocol document for this, as a supplement to our previous protocols? DH – Let’s have Kaushik try it before we announce it. I do like the idea of a single place with all the protocol info. I can also ask Gary Tresadern about whether we want this data included in the publication. DD -- Sounds good. I can write the protocol document once we’re ready. Even if this data doesn’t get into the upcoming publication the partners would probably find it valuable.
JW: would caution adding Schrodinger pathway to production for Season 1 we’ve already allocated effort here; don’t want to expand scope if DH and LD can be primary on responding to partners, that could work SB: I think the value-add is high here vs. effort; helps us to see if we’re heading in the wrong direction on an important metric for partners JW: with benchmarking we’re having this discrete Season 1; don’t want to expand scope of this SB: still think the value is high here; comparison to schrodinger is one of the top 2 things partners are interested in DH – In terms of licensing, it’s “officially” the same if the benchmarking data leaves the organization DD – This seems like a grey area. I agree with SB that there’s a big value add to making the tools available even if we don’t collect results. DH – Agree. Whether or not this goes into the publication people will find value in it. DD – We should be ready for the situation where partners have tested this on their sets and want to contribute their data JW – I’ll reiterate that we don’t have infrastructure personnel-time to run devops and user support, so this would need to be handled by DH and LD if season 1 contirnues.
DH – I also found that the openff-benchmark analysis from the industry dataset matched the rankings from the Lim set. This is very nice to see. DH will keep following up with Kaushik to get feedback on Schrodinger PR. DH will begin writing protocol doc for schrodinger optimization tool.
SB I’ve been playing around with torsiondrive benchmarks, especially with the JACS fragmented set. I’m trying to figure out what would be valuable metrics metric for barrier heights; metric for profile shape RMSD, and also RMSD that is influenced by torsion barrier height.
Trying some benchmarking in parallel to season 1. DD really helped by getting the package updated to use openff-toolkit (>0.9) Parallelization is a concern; need to see if it’s possible to batch parallelize analyses in particular (e.g. TFD) Rethinking how we use RMSD for different terms could be good. For large molecules cartesian RMSD is not a useful metric. So instead per-term RMSDs could be good (like for angles, impropers, etc) RMSDs for specific moieties of strong interest DD – Do you see these metrics as being a part of benchmarking infrastrcture, or living somewhere else? SB – I’d envision this living in openff-benchmark. My current goal is to have a single “analyze everything” tool that can consume the output of openff-benhcmark's optimizations. DD – Agree. I’d be happy to see openff-benchmark become a library of metrics. SB – Agree. I think this could eventually be fully schema-ified. Then this would be easily compatible with an eventual web visualization frontend. DD – This sounds good. Most of the data flow goes between SDF and OpenFFMol, and we could replace SDF with OFFMol jsons. SB – Could imaging having a JSON model that defines what a season is, which datasets were used, which metrics/analyses were run and how, etc. JW – Agree. This plan sounds great.
JH JW DD Worked with JW to do openff-toolkit update PR, and change testing logic to be more forward-looking. JW – New QCSubmit release? We’d previously heavily pinned dependencies to make the industry benchmark results standard across partner sites. . So now we’re removing as many pins as we can. So this should help with env flexibility, but it’s possible that this will change results. I noticed that unpinning QCEngine breaks things – The 0.19 version crashed when we try to use it. Working with LD on reproducing Lim dataset. We’re blocked on pulling down old QCA datasets, waiting on BP to finish migration. Adding multiprocessing to server export (off-b PR #82) Putting optimization executor into geopt – Once this is done, I’ll replace openff-benchmark's internal executor with calls to this. Messaged Christina Schindler and got a new public submission from Merck with explicit hydrogens. Will make a new v1.1 industry public dataset that replaces Merck’s old submission with the new one. Industry public dataset is still being computed on QCA – 90% complete. Checking for missing TFDs, still crunching data to check for this. DH – Any results on this? DD – I haven’t reproduced this yet, just need more results. DH – If the molecule doesn’t have any rotatable bonds then we shouldn’t get any TFDs Set of questions: if TFD missing for some methods but not all for a single molecule/conformer, that is unexpected if TFD missing for some conformers of a single molecule, but not all
JW: identified in Fox set BRI-00973, see TFD for some conformers, but not all matched_openff-1.3.0
matched_gaff-2.11
DH: we observe different reference used for comparison, which could be a case of different connectivity between the compared conformers, which would trigger non-matching SMILES between the conformers DH: seems like RDKit produces canonical smiles? JW: did you see any of these cases in your own results? LD: let me check, will investigate these questions on the Janssen set
JW: the Fox set might actually be pretty challenging, so not surprising if we’re getting exotic issues DH: so yes, LD we can probably look at some examples from our set and see if these are cases of stereochemistry or connectivity changes
DH – The problem with the case of this that I can look at in the Janssen set has double bond stereochemistry in a macrocycle. The root cause appears to be that some RDKit-generated conformers of this macrocycle are very distorted, such that the double bond either represents a different stereochemistry even before QM, or that the structure is so bad that the QM drives it to change stereo. (General) – This is a to-do item for season 2 LD – The QM optimization preserves the stereochemistry, so this is probably a problem with the generated conformer. JW – To solve this, we could re-percieve stereo from 3D immediately after conformer generation, and ensure that it matches what we expect. We’re probably not catching this problem now because we blindly trust the stereochemistry defined in the BONDS section of the SDF, regardless of the 3D coordinates.
DH: can we ask Thomas if he checks the generated conformers if they have the same stereochemistry, for say BRI-00973
|