2022-01-31 Core Developers meeting notes

Participants

@David Dotson
@Simon Boothroyd
@Matt Thompson
@Pavan Behara
@Chapin Cavender
@Jeffrey Wagner
@Diego Nolasco (Deactivated)

Discussion topics

Item	Notes

Item	Notes
General updates	SB – Offline beginning and end of Feb. MT – I’ll be working on doing the testing this week and next, so probably won’t need next stage of approvals until two weeks in the future SB – JW, feel free to bump me from approver position if there’s a holdup.
Round table updates	SB Mostly working on GCNs. Building up scripts needed to move forward, building up new molecules by hand to cover things that AM1BCC covers … GNChargeModels GH repo under my personal org, should transfer to OpenFF org. JW – This should be good to go whenever you’re ready. SB – I wasn’t sure if we had a policy for what goes under the GitHub org vs. what stays in individual repos. Many of my molecules had trouble converting Toolkit E/Z bond stereo to RDKit local stereo. I’ve opened a new PR to do this more robustly. JW – Same problem with OpenEye? SB – I don’t think so, we can set CIP stereo more directly with OE. This direct setting of E/Z in RDKit is a known shortcoming, mentioned in mailing list. Also hitting size limits with some file storage, both on QCA and in research codebases. For example, I pulled down lots of QCA results, made ESPs from the results, and now I have these files that took a while to make that I’d like to store with my code. JW – Usually I’d reach for GH release assets. Could also consider using amazon storage, but I’m not sure how much the management and costs would scale here SB – Could use Git LFS on a larger scale. DD – Who manages GH subscriptions/payments? MT + JW – Chodera’s credit card DD – Should probably switch this over to OMSF billing. JW – I’m interested to know where git LFS actually lives and how much it costs, will add “do a research cycle” to sprint backlog. SB – Looks like $5/month for 50 GB. DD – Would this solve bandwidth-constraint issue? (are limited checkouts possible?) I had some experience with PLBenchmarks but could use a better understanding. SB – IIRC, there’s some scope for cherrypicking which files come down. DN Learning about finances, thinking about how different people see OMSF’s strategy. Right now I think it will act as a hub for different projects – Right now OpenFE and OpenFF, Interchange could be a spinoff. JW – I don’t think Interchange should be considered as a spinoff - It’s so intertwined with our other infrastructure and science that it wouldn’t make sense for it to have a separate budget or management MT – Largely agree. Right now Interchange is solidly under OpenFF umbrella. In the future, there’s scope for it to become heavily involved in other parts of the comp chem world. DN – Gotcha. I’ve heard people talking as if there are two projects running - OpenFF and OpenFE. But there are really several significant efforts going on, including interchange JW – And bespokefit DN – I’m just reading into the finances, happy to send a summary as I gain a better understanding. JW + SB – We’ve tried to stay separate from finance, since it introduces a lot of additional degrees of freedom. But happy to consult/advise as single events if needed. CC Spent a lot of time on LiveCOMS review. This is a review article on experimental datasets on benchmarking protein FFs, with contributions from a lot of external authors in our field. So I’m revising this to have a consistent voice and tone, and will send the next draft to the coauthors next week. I’ve got a sense of the consensus from this manuscript and looked into the infrastructure requirements to implement them. They all look reasonable, and should largely slot into Evaluator. . I submitted an abstract to submit to the biophysical society meeting in late Feb. I’m putting together a poster on biopolymers in OpenFF, basically “hey! we can handle biopolymers now”, will send out draft by this afternoon and will ask for feedback. Do we have a repo of common figures? I found OpenFF GDrive slides linked from confluence, but the images are fixed size. Do we have any of them in a more scalable format? SB – I have several of my figures in SVG format, will share these with you. In general it would be beneficial to have a central repo of scalable graphics. CC – Where do we have the big “openff workflow” figure original? SB – I’m not sure. Maybe ask KCJ in #internal channel? JW – I’ll also check whether I have an original copy of that drawing. MT Spent most of my effort hacking away at single-molecule regression tests for interchange rollout. Found discrepancies between (0.11.0 rc exported with `use_interchange=True`) and (0.10.2 stable release) initially. Also tested against (0.11.0 rc exported using `use_interchange=False`). Found varying numbers of differences for all of these code paths. Then I tried to reproduce and was unable to find any differences between any code paths (except masses, because we had been changing the element packace to one with different masses definitions). Will move on to vsites this week. Big question is whether all the vsite types can go out to AMBER. JW – The AMBER export question has big strategic/research direction implications. Not sure how to start this conversation in project strategy/interchange planning. SB – Agree. This would be really helpful to understand. Since AMBER can do TIP4P and TIP5P I think our big needs may be covered. Our big need would be nitrogens in heteroaromatic rings (like a 5-membered aromatic ring with an N)… Figure 16.1 in the AMBER manual gives me hope that we’ll have support for what we need. I’ve pasted this image on the #core-devs slack channel. MT – I’m way less familiar with AMBER than I am with OpenMM/GROMACS, so this will require some learning from me. I’m optimistic that we can go to AMBER, but I don’t have much confidence without more research. MT – Could use a scientist(s) to consult on this - People mean different things when they say “support vsites” - One interpretation is “everything possible in the SMIRNOFF spec”, another interpretation is “just a few sorts of vsites are actually important to FF accuracy” SB – I kinda think that the whole SMIRNOFF spec is important - Each type has a physical case where they’re physically justified. Most important ones would seem to be divalentlonepair and trivalentlonepair. MT – I’m more thinking about corner cases with permutations of the `match` argument, molecule symmetries, smirks with wildcards. JW – Issue may be mostly edge cases, like a divalentlonepair vsite on a water with an in-plane angle and match=once. SB – I wouldn’t worry too much about those - Fine to skip those in testing. I can also suggest vsites/mols that will exercise hard cases. MT – I’m pretty sure I can support SB’s intent in these tests. But it may be good to have a clearer delineation of “edge case”, so I’d really like to make sure that the test molecules and FFs have every realistic case that interchange might be expected to cover. SB – DC’s group did a systematic study of which moieties would get lone pairs. That could be a good source of “important cases” - they published this a while back, it would be reasonable to support the ones that they mentioned. MT – Will do. Thanks. CC – Re: exporting to AMBER - The figure that SB posted is from the docs for mdgx - That’s a CPU only engine which has different support for vsites than the high-performance AMBER code (pmemd) DD QCArchive cut release 0.22.0 of QCEngine with Lori early last week; on `conda-forge` incluces `pycpuinfo` fix Ben and I established this as compatible with QCFractal `master` signal handling for QCFractal managers also just merged by Ben, release is imminent; can then deploy this to all compute resources and scale back up on PRP and HPC (e.g. Lilac) re-rolled all SPICE sets that had wavefunctions attached; attempted submission with single-points but still takes way too long in Actions automation will attempt to fix this week new PR on QCFractal for optimizing this a bit will make corresponding PR on openff-qcsubmit error cycling looks like it's choking as well due to memory usage from large DES370K set sigh DD – What’s the status of OMSF’s AWS account? Until we make the new QCF release it’d help to run this on a beefier EC2 instance, and we’ll want AWS set up anyway for other things in the near future. PB – Should we impose a limit on the size of molecule sets in the future? DD – We can probably use our judgement on this. Like, 500k molecule sets will cause infrastructure issues so we can block them or handle them on a case by case basis. I’d prefer to have a better knowledge of what sorts of datasets we should expect so that we can prepare for them ahead of time. PLBenchmarks Systems architecture doc draft complete for consumption by AWS folks; worked with John for editing worked with Jeff and Diego to spin up project management and working group call is out for joining as a stakeholder on `#developers` JW – I saw that RGowers gave detailed technical feedback on the spec doc - Should I encourage him to save this for the initial meeting? DD – No, I’ll address this when I have time. PB Mostly working on improper fitting and analysis last week. Started working on sage paper Some QCA dataset management. Is it possible to check whether two conformers of a molecule have a chiral center flipped? JW – I think so, something like RDKit’s perceivestereofrom3d https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=stereo from 3d#rdkit.Chem.rdmolops.AssignStereochemistryFrom3D CC – So, you’re looking for cases in a large dataset where stereo has flipped? PB – Yes JW Working on topology refactor - residue/chain info transfer and spec documentation Organizing working group for F@H infrastructure wth Diego and David Issue feedback + PR reviews Vsites test cases for MT Start routing PRs to biopolymer refactor branch? MT – I’d asked about making the biopolymer topology branch into the new master, you had said “no” because we may need to cut an 0.10.3 release. Is that still current? JW – Yes, also I’d like 0.11.0 to have us switch to a `main` branch. I’m just worried that in a rushed critical bugfix release we’d launch the biopolymer topology refactor early on accident.
Sprint planning

2022-01-31 Core Developers meeting notes

Participants

Discussion topics

Action items

Decisions