Multi-conformer torsiondrives not supported |
LW – Possibly the most important thing for me right now. I know you’re working on an ICRMSD solution and was wondering what the status of that is. MT – ICRMSDs are a mess, not sure we’ve made progress in the past few days. The RDKit solution out of the box wasn’t great. MDAnalysis would be the next one to explore. Not sure what a “reference” implementation is here.
LW – I think MDAnalysis might be helpful here. We’d talked about this in SMIRNOFF meeting a lot time ago, and found that… But important thing is to use the same method to measure for both mols. JW – Two major issues I recall. 1 - measurement issues (+-180, dihedral handedness kinds of things) and 2 - methyl rotation/graceful handling of symmetries LW – How bad was geomeTRIC solution? MT – Unsure. LW – I don’t think we had issues with geometric RMSDs. My big issue was looking at outliers by assigned parameter. But that’s when I found that geometric would interpret topology differently for different conformers. MT – Do we think geomeTRIC is doing spooky xyz2mol style things to infer things about the topology? MT – I think it seems like geomeTRIC is doing nice things on a population scale, but when we look at individual cases it gets ugly.
LW – If we don’t do fancy outlier detection, but we do want to do the ICRMSD plots like we did in previous FF releases, would YAMMBS be able to do that? Back to the top-level issue in this cell YAMMBS can now deal with torsiondrives, but we’ve never tested it on multi-conformer-seeded torsiondrives. Those should be treated as DIFFERENT torsiondrives, EACH with a minimum optimization at each grid point. MT – So we should identify torsiondrives by their QC record ID or something, so that the same molecule could have multiple torsiondrives LW – This isn’t a blocker to 2.3.0, but is important for 2.3.0 MT – Glad this isn’t a blocker for 2.3.0, but this does make the previous torsion analysis pretty suspect.
|
InChI mapping bottleneck |
LW – This is just a blocker in that it’s slow. I’m also not sure that it’s a necessary step in our process. MT – This may just be historical/a remnant of previous use of multiprocessing. LW – It looks like, after the code groups things, it flattens them out. MT – … It looks like the way that the PR interacts with the database is a strict improvement. LW – But somehow these changes drop 45 records, and I don’t know why. MT – Molecule IDs are one-for-one to mapped smiles. But the same chemical species (inchi) could have a one-to-many relationship to mapped smiles. So I should try dropping the use of inchi as an identifier altogether. JW – This scientifically makes sense. InChI is a super great identifier for a lot of cheminformatics comparisons, but in this case we should trust mapped smiles/whatever relevant identifier distinguishes our intent in our datasets, even if it disagrees with InChI. MT – Right, inchi wasn’t being used for that purpose, just to be a arbitrary identifier.
|