2021-08-09 Core Developers meeting notes

Participants

  • @Jeffrey Wagner

  • @Matt Thompson

  • @Lily Wang

  • @Pavan Behara

  • @David Dotson

  • @Simon Boothroyd

Discussion topics

Item

Notes

Item

Notes

Updates

  • SB

    • Mainly working on extending TG’s work on recharge. TG made the ability to train charge_increments of vsites (alongside bccs of normal bonds) for ESP/EF data. Training vsite positions is hard (because they’re nonlinear) and TG’s been working on that.

    • Package should support either pytorch or numpy. But in practice this is a pain. So I’ll probably learn a lot more about numpy/jax/tensorflow compatibility here.

  • MT

    • Short week

    • SB was working on a submodule in qcsubmit, upstreamed into openff-utilities. Generally, if you’re doing work in a specialized package that isn’t specific to your package, consider upstreaming.

    • Went to JW’s NIH talk.

    • Drew a line in the sand for “completion” of vsites. Got some emails that indicate that people think I know more than I do. Current status is an rc release that does vsites and bond order interpolation (I think bo interpolation is in a better state than vsites).

    • At this point, I’m looking for users to try these out. I can’t effectively move forward without feedback.

      • JW – would YQ benefit from this?

        • SB – I don’t think so. Espaloma outputs are already vectorized.

        • MT – Agree. I don’t see much of a benefit to YQ incorporating this.

      • SB – Biggest benefit would probably be straightforward energy evaluation that doesn’t require getting your hands on OpenMM. OpenEye’s requests would be the closest we could get to concrete users.

      • JW – Maybe we could get eyes on this with a developers week where people mess around with one-off projects?

        • SB –

      • PB – There’s a rotation student coming through the Mobley lab that will work on replicating solvation work.

        • SB – I think the plan there is to use Evaluator, which won’t use interchange until it’s more stable.

      • JW – Could use to help debug stuff like 1-4 energies,

      • SB – I’m thinking about how I could be more useful. I’d like to have this overlap making packed boxes or combining components into a final system. But it depends on whether topologies store each molecule individually or using the TopologyMolecule/reference_molecule convention currently in use.

        • JW – I’m thinking of removing the topologymolecule convention

        • (General) – More discussion to be had offline here)

    • Started working on biopolymer stuff using Toolkit feature branch.

  • DD

    • Short week last week, will be back in town until October.

    • Made openff-benchmark 2021.08.04.0, allows people to run sage rc2, OPLS, and special swope/lucas analysis, also unblocks a molecule export error that should unblock roche.

    • Worked with LD to move torsion violation analysis forward. Looking to find bad existing torsion parameters. Once he finds the violations, he’ll be running torsiondrives on similar molecules on Janssen internal set. This is some of what we’d want to include in season 2, so LD can be testing new behaviors here.

    • Need to follow up with DH and LD on OPLS coordination. This is particularly tricky because we don’t have org bandwidth to continue coordination, we can’t run OPLS locally because we don’t have Sch licenses, and we want to focus on getting people finished with Sage RC2.

    • Working on moving forward with public benchmarking dataset. Lots of new issues here, for example

      • The #complete goes up and downa nd we don’t know why

      • The dataset is so large that error cycling crashes and keeps other datasets from getting attention.

      • PB reviewed and caught a major issue in MM-after-QM submission. I’ll work on addressing that.

    • Re-jiggered QM queues and workers. Expanded compute on lilac and routed PCM jobs to single-core workers.

    • Psi4 1.4 stable is out (first release in ~2 years). We’re going to start incorporating this into our deployed managers. May reach out to PB for validation. Ultimately want to automate validation.

    • Torsiondrives in QCEngine - SB made a really nice PR to do this, I’m pinging BP to get his feedback on this. Tiptoeing around MolSSI vacations at the moment. This is also relevant for Xavier Lucas at Roche, who wants to run torsiondrives with ANI, which could also be relevant to season 2.

    • Protein-ligand front, I haven’t spent enough time on this, will be reserving some time this week to focus on it.

  • JW

    • MAde toolkit 0.10.0, openff-forcefields Sage-rc2, openff benchmark.

    • Lots of announcements about releases, September workshops, other stuff.

    • NIH talk, took a lot of time. I thought it may have been useful, many attendees and good discussion. I’m going to take a break from this for some time because I need to do OpenFF work.

    • Parmed 1-4 bug. Will try to resolve this this morning but it requires an OpenMM vs AMBER energy comparison, which may be hairy.

    • Will work on biopolymer stuff this week.

    • Am ready to get the final Sage release from SB. I’ll run the openff-forcefields release once this is done. We can talk about how to coordinate provenance/releases/tags at that time.

  • LW

    • Working on polymeterizer to make openff-style force fields. Trying to get it to a state where CCavender can use it. Had to refactor it to not use OpenFF toolkit because I need dummy atoms. So I’m talking with CC now to see how this can tie in.

    • My current work is uing the branch with SMARTS thatching that has unqiue=True, this makes librarycharges go a lot faster (which is necessary for polymer work)

    • LW – Wondering about residue handling

      • JW – Still planning to follow topology plans made with Iván

    • LW – Also having trouble with charge convergence in capped/mid-chain residues. They don’t seem to converge until 10+ residue chains.

      • PB – Using geo opt?

      • LW – RDKit geo opt/minimizaiton.

  • PB

    • Looked into Thomas Fox’s fragments. These overlap well with QM counterparts, so I’m not seeing the violations that Fox did. Fox initially posted fragments that I couldn’t reproduce. Then he posted this econd set, which also doesn’t reproduce the problems. So I’m adding random substituents to see if any of them will break it, like I did with bicyclopentane.

      • JW – bicyclopentane ended up being a technical bug – No bonds rearranged, some FFs just squished the bcp group and make our connectivity checker fail

      • SB – Trying to reproduce using benchmarking infra or openff scripts?

        • PB – Tried both for bcp and found different results. I’ve only used my own scripts for fox’s reports.

        • SB – It’d be good to try to reproduce Fox errors with benchamrking infra, since it’s heavily pinned.

    • Started doing some fitting experiments, like different strategies for whether to include dihedral RMSD and how to tune priors/hyperparameters. Looking to accumulate these studies in a repo so we can record how each factor can affect fits.

    • Trying to wrap up theory benchmark. Helping rotation student in Mobley lab to reproduce cross-solvation energies, may need some help getting these set up. Current datasets use MNSol and FreeSolv, and I’m working to get these set up on UCI cluster. So right now I have the student working on the examples/tutorials.

      • SB – Tutorials are probably the best way to start. I could see DASK being somewhat cumbersome to set up on the cluster. I can meet about how to get this set up right once they get to that point.

      • PB – Should we use GROMACS or OpenMM?

      • SB – It’s very layered. We use OpenMM on the very bottom of the stack, but it’s yank calling openmm.

      • JW – I can also help debug once you start working on it, I’ve helped OMadin debug queue submission envs.

      • SB – The root of that would be in dask jobqueue, so that’s a good place to look first in the future when you’re hitting submitted env issues.

  • MT – Use of pre-commit: I’ve only used this for two-ish months. This is a technology that works “under” git. It could plug in a bunch of places, but the main one is “a bunch of code that runs between you typing git commit and the files getting used in the actual diff”. pre-commit recently became its own package. So we could incorporate this into the toolkit to automatically run black and isort. It requires installing the pre-commit package and also installing it a second time in the repo. Since development can get hairy if one dev uses it and another one doesn’t, people have found that it may be better for things to be run on CI. So there’s a popular pre-commit-bot that can run in CI. This will run black and isort and the bot user will commit the changes. Some people are hesitant about bot commits, but this will still have the git history point to the correct PR since we do squash merges now. So in the long run I’d like to adopt something like this.

    • JW – I like leaving the yaml in the top level of the repo, and letting develoeprs “opt in” to using it, but I’m hesitant about bot commits right now.

    • MT –

    • DD – So, if we’re doing isort and black, could we make a PR where we test this?

    • MT – I’ll open a new issue to discuss the details/options here.

    • DD – I think, for most PRs, this would be valuable.

    • SB – MT, you already use this bot in interchange, right?

      • MT – Yes. I’ll collect some PRs. But since I’m the only developer and I use pre-commit the bot has never modified a file.

      • SB – Does the bot comment in the PR? Or just commit and give a status marker?

      • MT – It doesn’t comment, just commit and status marker.

      •  

    • MT – I think this could replace a lot of what LGTM does, but without being as obscure/annoying as the current implementation of LGTM.

    • MT – I’ll open an issue on GH to discuss this.

    • PB – How is this different from dependabot making commits?

      • MT – It’s pretty similar. dependabot does make commits, but it only affects configuration files.

      • JW – I’m OK with dependabot changing configuration files, but I need to be able to deeply inspect the git history, and I don’t know what the “best practices” are for this, and I don’t want to make the toolkit harder to maintain by somehow cluttering the git tree such that future developers can’t understand the previous decisions/developments.

  • Return at 20 after to discuss polymer infrastructure.

 

  •  

Action items

Decisions