2022-03-21 Core Developers meeting notes

Participants

  • @David Dotson

  • @Pavan Behara

  • @Chapin Cavender

  • @Matt Thompson

  • @Jeffrey Wagner

  • @Diego Nolasco (Deactivated)

  • @Simon Boothroyd

Discussion topics

Item

Notes

Item

Notes

General Updates

  • Karmen activated an OpenFF AWS account. I hope to only use this in exceptional circumstances.

Individual Updates

  • CC

    • Met with DD and PB to talk about dipeptide torsiondrives on QCA. The progress was showing a lot of “bursting” behavior, as we error cycled, and lots of errors would pop up from certain hosts. The main host that did this was on PRP. There were also other jobs that failed due to bad geometry. This was probably due to a sage pre-optimization step, where a bogus geometry is created. Because we know which molecules in this dataset make bad geometries, we can set up a sort of unit test for a step that detects these bad geometries.

      • JW – Sometimes we see bonds breaking in QM optimizaiton, or if there are technical issues in MM optimization then atom indexing can get scrambled and appear to be highly stretched bonds.

      • CC – This mostly happens with AAs with large sidechains, when you’re driving backbone torsions, and they run into each other and cause tangles/steric clashes.

      • JW – Maybe a useful breadcrumb for when we’ve done this before:

    • Met with JW and DN to set up a formal process for decisionmaking for protein FF project. Borrowed structure form interchange and F@H project plans. Decided that we’ll only make binding decisions in biopolymer FF call. Planning for approvers to be myself, MG, and MS.

    • Continuing to put together a plan for benchmarking the protein FF. I heard from JC that the benchmark that we want to do doesn’t require the new F@H interface, and can be handled by existing F@H machinery. So I’ll get in contact with Sukrit to begin verifying that this will work. I’ll talk about this course of action at this week’s protein FF call to gain agreement on pursuing it.

    •  

  • MT

    • Largely spent the week getting deep into hard problems.

    • SMIRNOFF-EP 00005 progressed. This will change the SMIRNOFFs spec, but it won’t change how you interact with fFs/software (basically the software behavior was non-spec before, we’re updating the spec to reflect what the software does). The core issue is when PME electrostatics are requested by the FF, but the topology is not periodic.

    • Added non-OpenEye runs for Interchange CI. Found new failures. The core of the issue is that a 0-step calculation in sander DOES update the coordinates because it satisfies distance constraints.

      • CC – Is this true of commandline sander, or the pysander python module? The latter may offer tighter control/may be better maintained. I think it’s just a wrapper but I’m not sure. There’s a section in the AMBER manual about it.

      • CC - section 21.14.4 (page 417) of the current Amber manual describes python bindings around the sander API: https://ambermd.org/doc12/Amber21.pdf#subsection.21.14.4

    • Poked around in some runtime performance issues, some in toolkit, some in interchange. A few in the Toolkit, one thing to keep in mind is, in openmm units, if you say distance=4*nanometer, that’s a few microseconds, but in pint it’s a few milliseconds. This rears its head in something like fromn_openeye, when you load multi-conformer molecules. I’ve opened a PR to fix a performance regression in from_openeye, but I suspect that there are other places where this issue makes a significant performance problem. If you run into slow runtime in other areas this is a good place to look.

    • Someone made a gromacs recipe on conda-forge. And it works well. So be aware that envs that need gromacs and search the bioconda and conda-forge channels will now pull gromacs from conda-forge. Be aware that there’s MPI and non-MPI builds, and if you get the former then the executable is called gmx_mpi, but otherwise it’s gmx.

    • Weeks ago I reached a “done” state for regression testing single-molecule energies for the interchange adoption project. This was achieved using the candidate toolkit with use_interchange={true,false}, but when I actually test a candidate build against a completely separate environment with 0.10.X, I see differences in the AM1BCC charges. To get the entire dataset to pass I have to allow tolerance up to 1e-3. Of the 10-20 molecules I’d test, 80 or 90% would pass with agreement to 1e-6, but there were a few that had at least one atom with errors at 1e-4 or 1e-3.

    • More work on vsites. The main effort on this is waiting on a test set from JW.

  • DN

    • Worked with CC last week on setting up a project plan. Afterwards, DMobley invited me t participate in the gnereal FF-release meetings. At the FF release meeting, we came up with the idea of structuring plans for all of our efforts. This will make it easier for the ad board and the PIs to understand what work is ongoing, and we’ll have some defense against scope changes. So folks who are starting up a new effort and want help getting this established should feel free to talk to me.

    • I’m still working on a strategic planning document. I’ve mentioned this before and I think that this will be a big help to control our scope of work.

  • DD

    • Protein-Ligand Benchmarks (fah-alchemy)

      • presented data model proposal to working group; great feedback, as well as detailed discussion on implications of certain model choices

      • worked with Jeff to explore question of where FF information goes in an alchemical graph

        • arrived at the conclusion that FF information as a property of the the transformation protocol (edge) makes the most sense

        • will present our full findings to working group tomorrow

      • proceeding with design iterations on skeletal objects

    • QCArchive

      • addressed compute errors (something’s up with PRP, no clue what the issue was, changed tags to keep dipeptide jobs from going to PRP) and poor geometry starting points in protein capped set; set may be getting close to complete, with only systematic errors remaining

        • worked with Chapin and Pavan to identify these; Chapin will develop a filter for geometries that catches these known cases

      • SPICE sets proceeding, pubchem set 2 nearly a quarter complete

    • Made a new QCEngine release. SB, I know this came out a little late timing-wise for your work with BSwope - Was that OK?

      • SB – The timing worked well. We’ll follow up with him in the coming days/weeks to get feedback.

  • PB

    • Helped JM setup fits on the cluster and how to benchmark them.

    • Joined torsiondrives debug with CC and DD.

    • Not much to update on work side.

  • JW –

    • Vsite-palooza. Making geometry and energy test cases for a few molecules and vsite FFs. Visual inspection revealed/confirmed problems. Opened bug reports. Will send notebook to MT today.

    • EP 00005b review - Seems acceptable, will update text today.

    • “FFs on nodes” discussion with DD.

  • SB --

    • There’s a repo under my name - “GNN-charge-models” - which should probably be transferred to OpenFF org. That contains some scripts for pulling down qcarchive records, mass AM1BCC calcs, ESP handling, etc. This will be very useful for the person who takes over from me.

      • SB – When I do a project, I want it to be kinda a fully-released framework, but then a particular study I want to have the utility scripts and study data in another plae, showing how I used the major framework.

      • JW – I’d approve this going into the OpenFF github org. I’d like to make sure that unpaid people aren’t on the hook.

      • SB – I’m concerned about the discoverability of these sorts of repos

      • JW – talked with JM about creating an index from say, a study repo, to the underlying library code that supports it

    • Looking into RESP. I’d always thought it was a well-specified, clear cut method. It turns out that there isn’t a single publication that says how to do things (parameters, symmetry, conformer handling, etc). The details seem to be spread over three different opaque publications. So I’m trying to get an implemntation of RESP into openff-recharge, but it needs user testing and clarification on details. But the point is that, if we’re going to train a model on AM1BCC, then if wecould just switch out the training data to be RESP then we’d have better charges.

      • DD – Are there a finite number of RESP variations that you’ve identified?

      • SB – There’s a fortran script in AmberTools, and a supposedly-equivalent C script in AmberTools, that seem to be the dominant implementation. The really complicated thing is determining atomic equivalence, and how that impacts charge assignment (seems to be different handling for heavy atoms vs. Hs). There are also prep scripts that impact the outcome that I don’t fully understand.

      • CC – AMBER mailing list seems to think that the canonical implementation is the RED server.

      • SB – I’ve looked into RED (not RED server, just RED), and there are a few more details there. But it’s very complex and often requires reading the code directly to understand behaviors.

Action items

Decisions