2025-04-23 Meeting notes

2025-04-23 Meeting notes

Participants

  • @Jennifer Clark

  • @Jeffrey Wagner

  • @Lily Wang

  • @Matt Thompson

Discussion topics

Item

Notes

Item

Notes

 

 

Multi-conformer torsiondrives not supported

  •  

  • LW – Possibly the most important thing for me right now. I know you’re working on an ICRMSD solution and was wondering what the status of that is.

  • MT – ICRMSDs are a mess, not sure we’ve made progress in the past few days. The RDKit solution out of the box wasn’t great. MDAnalysis would be the next one to explore. Not sure what a “reference” implementation is here.

  • LW – I think MDAnalysis might be helpful here. We’d talked about this in SMIRNOFF meeting a lot time ago, and found that… But important thing is to use the same method to measure for both mols.

  • JW – Two major issues I recall. 1 - measurement issues (+-180, dihedral handedness kinds of things) and 2 - methyl rotation/graceful handling of symmetries

  • LW – How bad was geomeTRIC solution?

    • MT – Unsure.

    • LW – I don’t think we had issues with geometric RMSDs. My big issue was looking at outliers by assigned parameter. But that’s when I found that geometric would interpret topology differently for different conformers.

    • MT – Do we think geomeTRIC is doing spooky xyz2mol style things to infer things about the topology?

      • LW – Maybe

    • MT – I think it seems like geomeTRIC is doing nice things on a population scale, but when we look at individual cases it gets ugly.

  • LW – If we don’t do fancy outlier detection, but we do want to do the ICRMSD plots like we did in previous FF releases, would YAMMBS be able to do that?

    • MT – I’m pretty sure that YAMMBS has ported the old behavior + code as it was before. So it’s kinda OK to stick with the kinda bad geomeTRIC solution, since that’s what we had before.

    • LW – Great, thanks.

  • Back to the top-level issue in this cell

  • YAMMBS can now deal with torsiondrives, but we’ve never tested it on multi-conformer-seeded torsiondrives. Those should be treated as DIFFERENT torsiondrives, EACH with a minimum optimization at each grid point.

    • MT – So we should identify torsiondrives by their QC record ID or something, so that the same molecule could have multiple torsiondrives

    • LW – This isn’t a blocker to 2.3.0, but is important for 2.3.0

    • MT – Glad this isn’t a blocker for 2.3.0, but this does make the previous torsion analysis pretty suspect.

  •  

 

InChI mapping bottleneck

  • LW – This is just a blocker in that it’s slow. I’m also not sure that it’s a necessary step in our process.

  • MT – This may just be historical/a remnant of previous use of multiprocessing.

  • LW – It looks like, after the code groups things, it flattens them out.

  • MT – … It looks like the way that the PR interacts with the database is a strict improvement.

  • LW – But somehow these changes drop 45 records, and I don’t know why.

  • MT – Molecule IDs are one-for-one to mapped smiles. But the same chemical species (inchi) could have a one-to-many relationship to mapped smiles. So I should try dropping the use of inchi as an identifier altogether.

    • JW – This scientifically makes sense. InChI is a super great identifier for a lot of cheminformatics comparisons, but in this case we should trust mapped smiles/whatever relevant identifier distinguishes our intent in our datasets, even if it disagrees with InChI.

    • MT – Right, inchi wasn’t being used for that purpose, just to be a arbitrary identifier.

  •  

 

Action items

Decisions