Refine goals for work
| How to get a new molecule test set JW – SHould be a set of molecule inputs that hit every edge case handled in our code, and rise every possible warnings and error. Also should have 20-3 “totally normal” molecules that are processed successfully. AD – Should it be a minimal set, or a minimal distinct set? If one mol can trigger 4 errors, should I just include that, or 4 mols (one for each error?) JW – I think one for each error, so that we can ensure that each one hits the appropriate code
AD – In reading molecules using RDKit, and then calling from_rdkit, I get an RDKit error (specifically this mol has Germanide, raises valence error) JW – I’d love to be able to say “if the following things are true about an input mol, then it’s appropriate as input for OpenFF” in the docs. AD – It’s hard to catch rdkit errors/warnings, since they’re frequently spruious. Like, if I’m catching errors/warnings to check for real problems, how do we handle all cases (“like title line greater than 80 characters”). AD – In many of these cases, rdkit is giving a warning that it may be mangling a molecule, in situations where OE is loading them just fine.
JW – We could put a lot of trust in the cheminf toolit santiziation/validation, butwe’d still want to guarantee 99%+ identical molecules being loaded from reasonable druglike input. AD – I’d like to be able to present a list of mismatches, but the complexity of how stereo and implicit hydrogens are handled makes it extremely complex. AD – I’ve been looking into stereochemistry (issue 146) and am looking for reproducing examples of the difference cases of those. AD – Working on automating creation of a test set that exercises all code paths in current toolkit, and makes a minimal set of molecules that exercise those paths. AD – Stereo coverage in test set? Long chains of directional double bonds? Should we have separate test cases where we have stereo in rings vs not in rings? Do we want to record undefined stereocenters? JW – Yes, eventually, but I think this might be a big lift, so let’s not make it an initial goal. It’s fine if we include some cases of this in the test set for future use, though. AD – Stereo from 3D vs connection table? You had said that we should use connection table over 3D, but OpenEye uses 3D by default. JW – I may have been wrong about our existing implementation of this. My priority is to ensure that both toolkitwrappers get the same molecule out of the same file, so if a SDF has 3D stereo that contradicts the connection table stereo, we should ensure that the same molecule comes out.
Cases where an atom where an atom COULD be chiral but the molecule is actually symmetrical so it's not.
AD – Having an “openFF stereo model”?
|