2021-12-20 Core Developers meeting notes

Participants

@Jeffrey Wagner
@Chapin Cavender
@Pavan Behara
@David Dotson
@Matt Thompson

Discussion topics

Item	Notes

Item

Notes

General updates

No core-devs next week.

Individual updates

DD –
- Some QC manager shenanigans on PRP and lilac. PRP seems to have DNS issues. Lilac jobs aren’t dying gracefully, and I’m working on some fixes to QC* to remediate this, but they’ll likely need to wait until the Jan release. BP reached out to me this morning about the latter, and we may push a fix sooner if needed. In the meantime I’m working on debugging exactly what’s going on with Lilac or how we could get a patch that works without needing to change the source code.
- JW – If a quick fix may be pushed, should we coordinate OpenFF the people running OpenFF compute to shut down compute or check in over the holidays in case an update is needed?
  - DD – No, since the problems are specific to Lilac and PRP, we should keep as many managers running on the other clusters as possible.
- DD – On Fri, we learned that we’d run out of storage on QCA if we do wavefunctions for all pubchem sets. So I’m going to reach out to Chodera and Eastman to try out the wavefunctions from pubchem set 1, and see whether they actually need them for the other sets and ask them for recommendations for how to resolve that.
- DD – Regardless of the above, we will need a new server for the central QCA. I’m working with Pritchard on this and iterating on feedback.
  - JW – I’m fairly negative about the idea of using AWS for QCA - Even if we get grants for a few years, we could end up ballooning up the storage requirements, and then we’d be on the hook for a massive bill if they ever didn’t renew our AWS grant.
  - DD – In that case we’d need to buy the local compute I recommended, which puts us back at step 1. I think the local machine would cost >=$20k. So this is basically needed because we can only handle 8 simultaneous connections, whereas with our own host we could do a lot more.
- DD – Pubchem sets are our largest yet. Previously, the industry set was our largest at 70k. But the pubchem sets put together are ~1 million. And it’s really the wavefunctions that are most of the cost.
  - DD and JW will meet later today to discuss how to productively engage with MolSSI and how to reduce DD’s QCA workload
- DD – I met with Lorenzo last week, touched base on paper writing and molecules that get their kekule structure changed by getting processed by Schrodinger stack, thereby breaking downstream RMSD analysis.
- DD – (showed architecture diagrams) Making some progress on F@H stuff, work will continue into 2022.
- PB – Re: Kekule structure rearrangement - Is this a common problem in how we parameterize a molecule?
  - DD – The context where this came up was when doing RMSD comparisons between molecules optimized using OpenFF+OpenMM vs those optimized by OPLS, where OPLS picks a different kekule form than is in the input. Lorenzo found that this can by resolved by something like https://www.rdkit.org/docs/source/rdkit.Chem.MolStandardize.rdMolStandardize.html
  - JW – There are problems that may happen here: Giving the same “molecule” different parameters based on its input kekule structure, and this specific issue we see in benchamrking where Schrodinger software puts a different kekule structure on the output compared to what was in the input.
  - PB – At the next ff-release call in Jan, we have a few cases where things like nitro groups may get different params based on thier input structures. I’ll present these and will ask whether the observed behavior is correct.
CC
- Spent a lot of time working on QC datasets. Began analyzing plots form 2D torsiondrives on dipeptides. In doing this, I found that QCA wasn’t correctly constraining sidechain dihedrals. DDotson and JHorton helped resolve this and resubmit as a v1.1 dataset. This looks good so far!
- Showed the (accidentally unconstrained) torsiondrives at the last biopolymer-ff call and we agreed that this is going fast/well enough that we can submit the full set of residues.
- Trying to figure out what to do for disulfides and degrees of freedom. Tentatively thinking of freezing the backbone of one of the cysteines into something like an alpha helix and driving the one on the other side of the disulfide bond.
- Still chasing people down to get LiveCOMS sections so we can have a complete draft in Jan. Then we’ll open this to review by OpenFF PIs and members. Then could submit for journal submission as early as Feb or March.
MT
- Iterated with MOsato (mobley lab tech) on biopolymer infrastructure + interchange testing. Meghan is working on using OpenMM’s PDBFixer to solvate input structures and otherwise prepare inputs.
- Did some smaller PRs with interchange - Like added interchange.to_pdb
- Worked a lot on Foyer interface, lots of little conveniences and edge case handling.
- Endianness PR to safely serialize numpy arrays in json and transfer between machines.
- Sometimes topology gets modified by system creation in confusing ways. Eg, topologybond partial bond orders sometimes don’t point back to the correct value in the reference molecule. But maybe this isn’t super important because the topology refactor could remove the whole TopologyBond class
  - JW – Agree that the topology refactor will probably fix this about as quickly as we could put a fix into a 0.10.x release, so let’s plan on pushing people to use the 0.11 RC package in jan if this is blocking them.
- I’m half-time Mon-Thurs this week (mornings CST), off this Friday and out all next week.
PB -
- Looked at differences in protonation states at physilogical pH (of 7.4) on molecules in training and benchmark sets. Around 75% have at least one reasonable protomer, around 25% have a different state. Will be adding an optimization dataset with those additional protonation states for molecules.
  - JW – Very cool.
  - PB – (shows some examples of protonation states, phosphoric acids being protonated when they wouldn’t really be in physiologic conditions, different representations of nitro groups with quadrivalent vs. pentavalent N)
- Working sessions with Jessica on how to do fitting on the cluster.
- Syncing with Meghan and Lorenzo on the torsion multiplicity project, will be meeting Meghan today for some subset analysis.
- Some qca work.
- Comparison of wbos from OE-AM1, AT-AM1 and GFN2-XTB. Had a discussion about whether we could replace AM1 use with XTB, will open to more folks at the next FF release meeting.
  SB mentioned that JW and CD were looking into ELF methods for AM1 implementations. What’s thes tatus of that?
  - JW – We basically found that using “ELF1” to find a conformer gives better agreemenet between AT and OE. So we’re interested in implementing this but don’t have effort available for the next few months.
  - PB – I’d be interested in trying to implement this.
  - JW – I’ll write down a spec for this feature and ping you, PB
JW –
- Worked on biopolymer cleanup (refactoring Molecule.from_pdb code to use wither ToolkitWrapper and for logic to be in appropriate places). Also thinking about good architecture for whole workflow - Ultimately we want something like Topology.from_pdb, but this will require some dancing around with openMM to load hierarchy info and recognize residues from PDB info and add CONECTs, then networkX to identify individual mols and add chemical info. So Topology.from_pdb will need to identify each component in the input and figure out whether it’s a small molecule from the unique_molecules kwarg, or whether it is a polymer and should to have chemical info assigned by a substructure library. And Molecule.from_pdb will need to detect whether the input is really more than one molecule and raise an error if so.

2021-12-20 Core Developers meeting notes

Participants

Discussion topics

Action items

Decisions