2021-06-14 Core Developers Meeting

Participants

@Andrew Dalke (Deactivated)
@Jeffrey Wagner
@Lily Wang
@Matt Thompson
@David Dotson
@Pavan Behara

Goals

Discussion topics

Item	Notes

Item

Notes

Updates

AD
- Posted about tool for “which subset of a dataset should I use for regression tests?”.
- (Posting some example outputs on #developers)
- Made analyses of cases where different numbers of atoms come out, highlighted some particular molecular features that cause trouble/lead to different numbers of Hs.
- Some issues with how SDF properties go from cheminformatics toolkit molecules/OFFMols. This made molecule datasets look very different to coverage analysis.
- Would like to abstract ow-level impoelemtation of tooklit interfaces into a separate place, and have toolkit/utils call those from a high level.
MT
- Working on energy tests to reproduce OpenMM energies when exporting directly from toolkit vs. from a OFF System object.
  - Single vs. multi-molecule systems
  - Single molecule energies match to 1e-3 J/mol (NOT kJ)
  - Condensed-ish phase calculations don’t match as well. Energy values are a bit rough, but parameter assignment comparison shows identical assignment. So I think it’s OK, but the energy mismatch is strange
- Only tested using cutoff electrostatics and periodic systems. Trying more things would lead to a combinatorial explosion in number of permutations.
- Did this analysis on ~70 mols. Would like to get access to more compute, and do >1k molecules. Also want to try some protein-ligand energies.
- VSites
  - Starting to implement support for vsites. It is hard – In toolkit there’s not much of a distinction between “parameter assignment” and “openMM export”.
  - Also started looking at whether our current implementation is at all compatible with GROMACS. Seems to be promising, but we won’t know until I try to implement it.
  - On Friday, started seeing if we could do minimal case of vsite export (since bondcharge vsite).
- JW – Check in?
  - (General) – SB/JW/MT will check in next week, start every-other-weekly schedule.
DD
- Sick for much of last week
- Next pharma partner benchmark call is scheduled for July 8
- Will do individual partner outreach this week. Have 6/10 partners with datasets submitted.
- Working with Lorenzo D’Amore on some additional implementation for benchmarking and openff-geopt
- On QCA work, I need to check that the compute labeling/automation is working as intended in edge cases. Also need to take some of the error cycling datasets out for scientific review.
- About to resubmit new public industry set
- This week, will keep pushing on automation, review how the first year of QCA automation has gone, and determining whether more changes are needed.
- No update on PLBenchmarks.
JW –
- Prepared and gave keynote
- More work on biopolymer topology. IP is moving to New York, and then beginning work with John.
- Cavender onboarding/QC Notebook session w/ Pavan
- Merged FF modification example from Josh Mitchell. Also worked on theme refactor. Interested in feedback on having separate docs theme repo.
  - DD – If we want to have a consistent identity, I don’t think we have another choice.
  - MT – In the scenario of “something breaks and nobody knows CSS”, can’t we remove it in that case?
  - JW – Yes, but I don’t like the approach of “if it breaks, throw it away”
  - LW – CSS is fairly approachable, can be dynamically changed in-browser.
    - DD – Agree. I’ve had good experiences with modifying CSS.
- Want to push for new release, final PR from my perspective is the charge rounding one.
- Will be giving talk at Molssi workshop tomorrow morning. May be a but unavailable while I throw together slides this afternoon.
LW --
- Working on making sure the code to fragment big molecules + assign charges + cobble them back together is robust.
- Found some bugs in the process, fixed some in my local copy, but want feedback on whether these changes are wanted.
- Interested in more on biopolymer implementations and plans, but a lot of the written material is out of date.
  - JW – Let’s chat on Wednesday about this.
- LW – How reproducible should AM1BCC charges be?
  - JW – 99% of molecules should have all their atoms within 0.1 or 0.05 e- of each other.
  - LW – I think I found a case where the assigned charges differ depending on atom order
- AD – LW mentioned running black, do people here use pre-commit hooks?
  - MT – I use pre-commit CI. This is a service that lints the code and pushed the results to your branch.
  - AD – I noticed that isort wasn’t in developer build instructions
    - JW – This is correct. The old dev docs instructions and the new ones wouldn’t install black and isort.
    - MT – Linters aren’t in the test_env file, they’re in a separate CI action. This is easy because they’re not conda-installed.
  - The default developer docs should be updated to include installing black and isort
- MT will put together a demo of using pre-commit hooks
- MT - There are two ways to do pre-commit linting.
  - One way is to run locally, and have the automation run before you write the commit message. This is simple, and the git history truly records the author. The complexity here is that everyone needs to agree on the configuration (can be a shared yaml file in the repo).
  - Other way is to have a bot/service on a cloud host that, if the linting action WOULD fail, it will fix the failing files. This basically runs the pre-commit hooks on whatever the user pushed. In this case the reformatting commits are authored by the bot. But this can be a fall-back, where I usually have the pre-commit hooks running on my machine, but if they somehow miss something then the bot will author a commit to clean them up.
- JW + DD – We’re interested in seeing a demo of pre-commit hooks next week
PB
- More debugging of sulfonamide issue. It looks like b54 may be coupled to the bad angle parameter.
- Did a refit on the sage release candidate that included the dihedral RMSD in the optgeo scoring function.
  - JW – Is dihedral RMSD a new implementation?
  - PB – No, it was already implemented. Each type of RMSD/internal coordinate is scaled differently, but previously the dihedral RMSD denominator in the total score may have been too high, so it’s possible that the FF was being fit in a way that deprioritized them too much.
- Worked on Yuanqing’s reported bugs
  - JW – Were all of these caused by connectivity rearrangements?
  - PB – I saw it for one case, and sent YW script to check for other rearrangements. He said it works.
- Made QCA molecule retrieval tutorial notebook, went through with Chapin/Wagner/Mitchell
- Tried out parallelized benchmarking scripts from SB on the UCI cluster. Now can do Lim/Hahn benchmarks in under an hour, rather than in 20 hours running in serial. Ran on “openff full optimization benchmark 1”
- This week, will continue theory benchmarking

PB – After doing a fit, how do you remove the “parameterize” attributes?
- JW – Read the OFFXML using allow_cosmetic_attributes=True, then write it with discard_cosmetric_attributes=True
- https://open-forcefield-toolkit.readthedocs.io/en/latest/api/generated/openff.toolkit.typing.engines.smirnoff.forcefield.ForceField.html#openff.toolkit.typing.engines.smirnoff.forcefield.ForceField.to_file

2021-06-14 Core Developers Meeting

Participants

Goals

Discussion topics

Action items

Decisions