2021-04-26 Core Developers meeting notes

Participants

  • @Jeffrey Wagner

  • @Lily Wang

  • @Iván Pulido

  • @Pavan Behara

  • @Simon Boothroyd

  • @Matt Thompson

  • @Trevor Gokey

  • @David Dotson

Discussion topics

Item

Notes

Item

Notes

Roundtable updates

  • SB

    • Fragmenter port is finished Working on regression tests, basically a complete rewrite. Largely pydantic-based structure. Updated to use common package structure. JH will be working on regression tests. We don’t expect this to completely reproduce previous fragmenter, instead it should behave similarly.

    • Significant work on bespokefit and qcsubmit. bespokefit used to have its own Results object that copied a bit of code/data from QC* ecosystem, so we’ve replaced those with objects that reference data in QCA, and functions that offer a lot of control over how/when those are pulled down. Lots of filters added as well for things like SMARTS and charge. These new Results objects are being integrated into BespokeFit, which will hopefully make using it for full FF fits a bit more straightforward.

      • JW – Was bespokefit using pydantic-backed factories before?

      • SB – Mostly, some design patterns were changed, immutable fields made class attributes e.g. Some confusion over instances vs. types. It was already pretty good.

    • IP – AFAIK in pydantic, we can’t expose __methods to users. Is there some way we can add this anyway?

      • SB – I don’t think there are any restrictions. Private methods should be totally supported (single underscore), and double underscore should be good as well.

    • DD – Should we expose pydantic objects in the public API? (as opposed to hidden in a _data attribute)

      • SB – I think they should be exposed in the API, so not hidden behind a _data attribute. In QCA I’ve seen that there’s a somewhat complex layer of getters above the data model, whereas I find it easier to reach right in.

      • DD – I guess it depends on the complexity of the object being described.

      • SB – Also, if the data pydantic model is private, we can’t use tools like FastAPI.

      • DD – It should work if you make simple wrappers on the client side. Though it will depend on the complexity of the objects.

      • SB – I’ve done this by making+exposing composite objects at the points where I need the API.

    • DD – One thing about making the data model also be the API is that it makes all internal structure changes be effectively “API breaking”.

      • SB – That may not be necessarily the case, since you can offer getters and setters to replicate the previous structure.

  • MT

    • Short week last week – Vaccine part 2 on Wednesday.

    • Lots of meetings on interoperability on Monday.

    • Working on getting exporters to get the same nonbonded energies. Only made a little progress. Eg, exporting to both openmm and gromacs and trying to get the same energy from the different engines. Lots of difficulty from the different bundling of energies/settings in different data formats/objects

  • IP

    • Working on meeting with Perses team to spec Topology refactor to their needs. We had some molecule-modification functions in mind, and they asked for some kinda different things. So we’re working on an aspirational API that will satisfy their needs.

    • Studying pydantic and how it can fit our needs/where it would be appropriate.

    • Studying chemical components dictionary. Does it contain reactions?

      • LW – I haven’t heard about this containing reactions, but it would be great if they were recorded somewhere.

  • DD

    • Out Thurs+most of Fri from shot.

    • Submitted most of gen3 torsiondrive dataset. This is almost complete (in 5 days!)

    • Added compute-tag-based routing to qca-dataset-submission

    • Moved forward with QCSubmit #108, which should improve performance + consistency.

    • On the QCA side, trying to make all the dataset types look similar (previously torsiondrivedatasets were pretty different from other types)

    • Decided to move local optimizer and local torsiondriver from openff-benchmark to a new openff-gopt package.

    • Bill Swope from Genentech is helping us try out the Schrodinger benchmarking workflow. Some trouble with ffbuilder commands.

    • Protein-ligand benchmarks on F@H: Met with LD and DH last week and got a demo of how it operated previously. This will be an effort to get these running on F@H instead on the Janssen cluster.

  • PB

    • Reading up on QM theory/benchmarks. Scoping out how to move forward with choosing methods for our needs – Which functionals to use and their relative performance. This is still a work in progress.

  • LW

    • Worked on architecture for experiments involving residue-based FFs.

    • Started accumulating code in a new repo:

    • If the API for bespokefit+fragmenter are becoming stable, I’d like to know when to start using them

      • SB – Fragmenter should be stable now. Bespoke fit is going to change a lot soon.

  • TG

    • Working on fixing vsites now that we’re putting them into recharge.

    • There’s a PR for calculating vsite coordinates in OFFTK API. Added tests last night, now should just be releasenotes updating

      • JW – This seems to be in good shape. I’ll think about whether it’s necessary to compare for exact matching between OpenMM’s calculated vsite location vs OpenFF’s calculated vsite locaiton

    • TG – One difference between our and OpenMM vsites is the use of weights – OFFTK previously allowed access to vsite weights in the init function, even though they weren’t in the spec.

      • JW – I’ll check over this in the previous and new API.

    • TG – In the docs, I started putting rst links to openmm documentation. Is that OK?

    • TG – DD – I noticed, on qca-dataset-submission’s GH actions, some jobs aren’t getting restarted. Specifically, like a week ago, there were timeouts in the GH actions logs.

      • DD – I haven’t seen any timeouts in error cycling for the last 20 days.

      • JW – Maybe this is the public benchmarking set?

      • TG – Sometimes after an error cycling, we see ~100 errored jobs immediately.

      • DD – This may be the result of one of my changes. I changed a glob in the code to do something different with a wildcard, but for a while the codebase was in a state that missed compressed file suffixes.

        • TG – This is probably what I had seen before. I’d done a manual restart and will keep my eyes open for if it starts happening again.

  • JW –

    • 0.9.2 toolkit release. Adds functionality for loading partially mapped smiles.

    • Working with Iván on speccing out new Topology API. Interoperability meetings with Perses and MosDef. Importantly, we’ll be adding a new class, provisionally named AtomTypedMolecule , which will basically be a networkx graph with almost entirely optional data fields for each atom/bond.

      • TG – Would be interesting to see if this will overlap with my bit vector work. I’ve been making a lot of mapping functions, since I often need something more flexible than exact isomorphisms.

      • JW – We may support molecule substructures in som way, but this isn’t our top priority, so check in in a week or two if you’dlike to follow how this is coming.

    • Infrastructure space reorganizing/labeling

      • DD – Can we do the same “create meeting notes with labels” approach for Public Space?

      • JW – @Karmen Condic-Jurkic knows how to set this up – I’ll bring this up next time we talk

    • Docs and examples – Josh mitchell has been working on splitting examples into their own CI, putting examples into a dedicated package. There will be more movement on this in the next few weeks.

    • This week, I’m hoping to tackle nonbonded force splitting, settling ona meaning of swtich_width, charge rounding w/ ambertools backend. Let me know if there are other nonbonded changes, since I don’t want the numerical outputs to charge frequently.

      • @Jeffrey Wagner – NOTE TO SELF: Test whether virtualsites behave nicely with the force splitting approach. And in general that these chagnes will work nicely with inheritance.

    • JW – Had a feasibility chat with Della Corte lab about joining a grant for ionic liquid simulations. Seems interesting, SB’s smirnoff_plugins repo really helped as an example for showing how to use custom nonbonded forces

      • MT – Be careful about this, ionic liquids are kinda the wild west of infrastructure/method irreproducibility. I’d like to be involved in future meetings.

    • SB – Re: Atom maps – Is there more thought about making them first-class citizens in the toolkit? Right now they’re in molecule.properties, and some methods respect them, but others don’t (e.g. to_rdkit). So we should either make them first-class (i.e. functions should respect and retain them) and keep them in properties, or we make them first-class and put them as attributes of atom.

      • JW – I’d like the latter approach, where we make it an attribute. But it seems strange to me that we’d only offer one map index per atom – Would we want to support arbitrary numbers of ma indices?

      • SB - One example of wanting multiple would be in merging torsiondrive fragments into a single parent molecule and trying to keep track of the different fragment indices, but this may not be totally necessary.

      • JW – Let’s collect feedback on how map index attributes would look here:



Return for sprint planning at 6 past the hour

Action items

Decisions