Page Comparison

...

Item

Notes

General updates

SB – Offline beginning and end of Feb.
- MT – I’ll be working on doing the testing this week and next, so probably won’t need next stage of approvals until two weeks in the future
- SB – JW, feel free to bump me from approver position if there’s a holdup.

Round table updates

SB
- Mostly working on GCNs. Building up scripts needed to move forward, building up new molecules by hand to cover things that AM1BCC covers … GNChargeModels GH repo under my personal org, should transfer to OpenFF org.
  - JW – This should be good to go whenever you’re ready.
  - SB – I wasn’t sure if we had a policy for what goes under the GitHub org vs. what stays in individual repos.
- Many of my molecules had trouble converting Toolkit E/Z bond stereo to RDKit local stereo. I’ve opened a new PR to do this more robustly.
  - JW – Same problem with OpenEye?
  - SB – I don’t think so, we can set CIP stereo more directly with OE. This direct setting of E/Z in RDKit is a known shortcoming, mentioned in mailing list.
- Also hitting size limits with some file storage, both on QCA and in research codebases. For example, I pulled down lots of QCA results, made ESPs from the results, and now I have these files that took a while to make that I’d like to store with my code.
  - JW – Usually I’d reach for GH release assets. Could also consider using amazon storage, but I’m not sure how much the management and costs would scale here
  - SB – Could use Git LFS on a larger scale.
  - DD – Who manages GH subscriptions/payments?
  - MT + JW – Chodera’s credit card
  - DD – Should probably switch this over to OMSF billing.
  - JW – I’m interested to know where git LFS actually lives and how much it costs, will add “do a research cycle” to sprint backlog.
  - SB – Looks like $5/month for 50 GB.
  - DD – Would this solve bandwidth-constraint issue? (are limited checkouts possible?) I had some experience with PLBenchmarks but could use a better understanding.
  - SB – IIRC, there’s some scope for cherrypicking which files come down.
DN
- Learning about finances, thinking about how different people see OMSF’s strategy. Right now I think it will act as a hub for different projects – Right now OpenFE and OpenFF, Interchange could be a spinoff.
  - JW – I don’t think Interchange should be considered as a spinoff - It’s so intertwined with our other infrastructure and science that it wouldn’t make sense for it to have a separate budget or management
  - MT – Largely agree. Right now Interchange is solidly under OpenFF umbrella. In the future, there’s scope for it to become heavily involved in other parts of the comp chem world.
  - DN – Gotcha. I’ve heard people talking as if there are two projects running - OpenFF and OpenFE. But there are really several significant efforts going on, including interchange
    - JW – And bespokefit
  - DN – I’m just reading into the finances, happy to send a summary as I gain a better understanding.
    - JW + SB – We’ve tried to stay separate from finance, since it introduces a lot of additional degrees of freedom. But happy to consult/advise as single events if needed.
CC
- Spent a lot of time on LiveCOMS review. This is a review article on experimental datasets on benchmarking protein FFs, with contributions from a lot of external authors in our field. So I’m revising this to have a consistent voice and tone, and will send the next draft to the coauthors next week. I’ve got a sense of the consensus from this manuscript and looked into the infrastructure requirements to implement them. They all look reasonable, and should largely slot into Evaluator. .
- I submitted an abstract to submit to the biophysical society meeting in late Feb. I’m putting together a poster on biopolymers in OpenFF, basically “hey! we can handle biopolymers now”, will send out draft by this afternoon and will ask for feedback.
  - Do we have a repo of common figures? I found OpenFF GDrive slides linked from confluence, but the images are fixed size. Do we have any of them in a more scalable format?
  - SB – I have several of my figures in SVG format, will share these with you. In general it would be beneficial to have a central repo of scalable graphics.
  - CC – Where do we have the big “openff workflow” figure original?
  - SB – I’m not sure. Maybe ask KCJ in #internal channel?
  - JW – I’ll also check whether I have an original copy of that drawing.
MT
- Spent most of my effort hacking away at single-molecule regression tests for interchange rollout. Found discrepancies between (0.11.0 rc exported with use_interchange=True) and (0.10.2 stable release) initially. Also tested against (0.11.0 rc exported using use_interchange=False). Found varying numbers of differences for all of these code paths. Then I tried to reproduce and was unable to find any differences between any code paths (except masses, because we had been changing the element packace to one with different masses definitions).
- Will move on to vsites this week. Big question is whether all the vsite types can go out to AMBER.
  - JW – The AMBER export question has big strategic/research direction implications. Not sure how to start this conversation in project strategy/interchange planning.
  - SB – Agree. This would be really helpful to understand. Since AMBER can do TIP4P and TIP5P I think our big needs may be covered. Our big need would be nitrogens in heteroaromatic rings (like a 5-membered aromatic ring with an N)… Figure 16.1 in the AMBER manual gives me hope that we’ll have support for what we need. I’ve pasted this image on the #core-devs slack channel.
  - MT – I’m way less familiar with AMBER than I am with OpenMM/GROMACS, so this will require some learning from me. I’m optimistic that we can go to AMBER, but I don’t have much confidence without more research.
  - MT – Could use a scientist(s) to consult on this - People mean different things when they say “support vsites” - One interpretation is “everything possible in the SMIRNOFF spec”, another interpretation is “just a few sorts of vsites are actually important to FF accuracy”
  - SB – I kinda think that the whole SMIRNOFF spec is important - Each type has a physical case where they’re physically justified. Most important ones would seem to be divalentlonepair and trivalentlonepair.
  - MT – I’m more thinking about corner cases with permutations of the match argument, molecule symmetries, smirks with wildcards.
  - JW – Issue may be mostly edge cases, like a divalentlonepair vsite on a water with an in-plane angle and match=once.
  - SB – I wouldn’t worry too much about those - Fine to skip those in testing. I can also suggest vsites/mols that will exercise hard cases.
  - MT – I’m pretty sure I can support SB’s intent in these tests. But it may be good to have a clearer delineation of “edge case”, so I’d really like to make sure that the test molecules and FFs have every realistic case that interchange might be expected to cover.
  - SB – DC’s group did a systematic study of which moieties would get lone pairs. That could be a good source of “important cases” - they published this a while back, it would be reasonable to support the ones that they mentioned.
    - MT – Will do. Thanks.
- CC – Re: exporting to AMBER - The figure that SB posted is from the docs for mdgx - That’s a CPU only engine which has different support for vsites than the high-performance AMBER code (pmemd)
DD
- QCArchive
  - cut release 0.22.0 of QCEngine with Lori early last week; on conda-forge
    - incluces pycpuinfo fix
    - Ben and I established this as compatible with QCFractal master
  - signal handling for QCFractal managers also just merged by Ben, release is imminent; can then deploy this to all compute resources and scale back up on PRP and HPC (e.g. Lilac)
  - re-rolled all SPICE sets that had wavefunctions attached; attempted submission with single-points but still takes way too long in Actions automation
    - will attempt to fix this week
    - new PR on QCFractal for optimizing this a bit
    - will make corresponding PR on openff-qcsubmit
  - error cycling looks like it's choking as well due to memory usage from large DES370K set sigh
    - DD – What’s the status of OMSF’s AWS account? Until we make the new QCF release it’d help to run this on a beefier EC2 instance, and we’ll want AWS set up anyway for other things in the near future.
    - PB – Should we impose a limit on the size of molecule sets in the future?
    - DD – We can probably use our judgement on this. Like, 500k molecule sets will cause infrastructure issues so we can block them or handle them on a case by case basis. I’d prefer to have a better knowledge of what sorts of datasets we should expect so that we can prepare for them ahead of time.
- PLBenchmarks
  - Systems architecture doc draft complete for consumption by AWS folks; worked with John for editing
  - worked with Jeff and Diego to spin up project management and working group
    - call is out for joining as a stakeholder on #developers
  - JW – I saw that RGowers gave detailed technical feedback on the spec doc - Should I encourage him to save this for the initial meeting?
    - DD – No, I’ll address this when I have time.
PB
- Mostly working on improper fitting and analysis last week.
- Started working on sage paper
- Some QCA dataset management.
- Is it possible to check whether two conformers of a molecule have a chiral center flipped?
  - JW – I think so, something like RDKit’s perceivestereofrom3d
    - https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=stereo from 3d#rdkit.Chem.rdmolops.AssignStereochemistryFrom3D
  - CC – So, you’re looking for cases in a large dataset where stereo has flipped?
    - PB – Yes
JW
- Working on topology refactor - residue/chain info transfer and spec documentation
- Organizing working group for F@H infrastructure wth Diego and David
- Issue feedback + PR reviews
- Vsites test cases for MT
- Start routing PRs to biopolymer refactor branch?
  - MT – I’d asked about making the biopolymer topology branch into the new master, you had said “no” because we may need to cut an 0.10.3 release. Is that still current?
    - JW – Yes, also I’d like 0.11.0 to have us switch to a main branch. I’m just worried that in a rushed critical bugfix release we’d launch the biopolymer topology refactor early on accident.

Sprint planning

Trello card

url	https://trello.com/b/dzvFZnv4/infrastructure

Versions Compared

Old Version 2

New Version Current

Key

Action items

Decisions