/
2021-08-16 Core Developers meeting notes

2021-08-16 Core Developers meeting notes

Participants

  • @Jeffrey Wagner

  • @Lily Wang

  • @Matt Thompson

  • @David Dotson

  • @Pavan Behara

  • @Chapin Cavender

Discussion topics

Item

Notes

Item

Notes

Updates

  • MT

    • Developed and merged SMARTS matching PR w/ unique flag from LW. Any parameterhandler could use it but only LibraryCharges has it on by default.

    • Worked with JMitchell a bit.

      • There’s a new OpenFF sphinx theme that’s being rolled out to core repos. It sounded like there were some possible issues with Pydantic.

      • Talked about two options for autogenerating api docs – AutoAPI and AutoSummary – I arbitrarily chose to go with AutoSummary.

        • LW – Is there a record of this discussion?

        • MT – JMitchell made some notes about this across a few PRs, I’ll link those in the meeting notes.

          • autosummary:

          • autoapi:

      • I’ve taken a few stabs at making a User Guide / “The Book”, would like more feedback on this.

    • Worked on adding additional systems/energy tests for Interchange.

  • CC

    • Worked with DD, deployed QCA worker on TSCC. It seems to have stayed up all weekend and completed lots of jobs. There’s about a 45% success rate so far, lots of SCF convergence errors. So this could be a science/input problem and I’ll keep my eye on this. DD, is this normal?

      • DD – This is in the range of what we’d expect. There was a problem with the QCA server over the weekend. Seems to have been inaccessible for 7 hours on Sunday.

      • CC – I’m mostly seeing scf errors, but will keep an eye open for these.

      • DD – SCF errors are mostly determinstic/restarting won’t fix them.

      • CC – Goal of this is to do dipeptide torsiondrives. They’re all on openff-tscc tag.

    • CC – Working on preparing dipeptide torsiondrive set. However when I want to run backbone torsiondrives of the same molecule with different sidechain rotamers, QCSubmit defaults to merge the molecules and lose information. So JHorton recommended to generate the jobs separately and then add them to the dataset one at a time to avoid the deduplication.

      • CC – Also need to figure out how to specify to torsiondrive that I want to keep the sidechain restrained during the backbone scan

      • DD – Here’s how we prepared the torsiondrive datasets with DCerutti, where we restrained and scanned permutations of psi, phi, chi1

      • (the specific dataset prep notebook from that PR:

    • CC – 8 workers / 8 cores + 16 GB ram running now, could quadruple if needed but this is on shared nodes

  • DD

    • Partner benchmarking: All partner results are in

      • Roche needs to re-send us results that don’t include OPLS (since they didn’t get Schrodigner approval).

      • Roche personnel (LD, DH, GTresadern) will coordinate collection of OPLS and Sage results collection.

    • QM-then-MM public dataset was fully submitted on Friday (recall that there were submission issues due to scale). 66% of 501k optimizations are complete. I’ll work with LD on exporting this dataset so it can be analyzed

    • Deployed new production envs last week (Thanks PB and JHorton for helping to test). This includes a stable release of Psi4 (first in years), latest openff-toolkit, etc.

      • I’ll coordinate with TG to update his manager configuration for UCI cluster

    • There was a public QCA outage yesterday. Root cause seemed to be daily incremental backups - We’ve added lots of data recently and that’s broken things at our current scale

    • Redid how error cycling works, since my original implementation was memory-inefficient. Now it crashes less.

    • Added automated tests to qca-dataset-submission (basically running qcsubmit tests before we deploy)

    • No progress on PLBenchmarks yet - Hoping to use bandwidth reclaimed from partner benchmark to get moving on this.

  •  

  • JW

    • Out Friday and all next week. I’ll cancel the next core-devs.

    • Finished looking into 1-4 scaling issue. Made OpenMM nonbonded energy decomposition that may be useful for others

    • Started back up on biopolymer development. Prototypes hierarchy perception and residue/chain iterator.

      • CC – Can we load from PDB yet?

      • JW – Not really, I’ll try to have this prototyped by the end of the week so folks can try it while I’m out.

    • SMARTS matching code profiling / Protein FF debugging

    • Sage review

  • PB

    • Worked on theory benchmarks. Looks like our default is pretty good. Though our reference is biased toward our defualt spec, since our benchmark uses the final geometries from default for energy evaluations in other bases. So I’m doing torsiondrives using a more detailed basis/method.

      • JW – This “default bias” was also seen in the benchmarking project.

      • PB – Also there was bias introduced in the other dofs in the constrained torsiondrives

      • CC – Which basis set?

      • PB – B3LYP-D3BJ DZVP

      • CC – how much more expensive is MP2 compared to default?

      • PB – About ~100x. In the context of torsiondrives this is more like 500x.

    • Working with a rotation student in Mobley lab to run openff-evaluator. LW helped get this running in colab. Now I’m working on an issue interfacing with the UCI queue – I want to use the GPU queue but it’s using the default instead.

      • SB – When you create a job, there’s a queue backend keyword that can be populated.

      • PB – I think I set this. But the submission doesn’t reflect my changes, so maybe I’m modifying the wrong thing?

      • SB – Is is the dask-worker jobs that go to standard queue?

      • PB – No, the job name I see is “wrap”

      • SB – It sounds like this isn’t spawning workers at all. Which queue system is it using?

      • PB – UCI cluster could use lsf/bsub or slurm.

      • SB – May need to make a new backend for slurm systems. There’s a library called dask-jobqueue that evaluator uses on the backend. So to iterate rapidly on this, it may be best to try a pure dask-jobqueue submission to isolate the submission issues.

      • PB – Sounds good. I’ll also pursue the lsf submissions.

      • SB – Is it using the bsub < input convention? I’ve had trouble with this and the distinction between specifying an input vs. piping an input.

      • PB and SB will meet tomorrow at 8 AM pacific / 4 PM UK

  • LW

    • Looked at how much of a difference AA neighbors make to charges of a central residue. Eg, ACE X-X-central-X-X-NME. I found that with 2 neighbors on each side, charges converge to <0.01. Then I started looking into the effect of having different neighbors. I’m finding that they’re quite different (charges residues will have quite an effect on the central).

      • I’m a little concerned about the possibility of OE AM1 introducing artifacts.

        • JW – We’re coming around to the idea that OE AM1 doesn’t have proton migration artifacts

        • LW – More specifically, I’m seeing things like ACE-VAL-X-central-ARG-Y-NME gives different results than ACE-VAL-Y-central-X-VAL-NME.

        • LW – Also differences in atom ordering are giving substantially different results

        • SB – I saw something similar in my work

        • CC – Could you pre-generate conformers to reduce this effect?

        • JW – Could use RDKit to generator conformers (remviong the effect of atom ordering from OE conf gen) and then turn off optimize so that the atom order doesn’t have an effect on the AM1 minimization/constraints.

        • SB – Let’s pick this back up at the FF release call on Thursday.

      • Used RDKit MMFF minimization for geometries.

    • I haven’t been getting most github emails due to a gmail issue, so sorry if I’ve been ignoring threads.

  • SB

    • Finished modifying recharge to now fully support vsites. Can now do bccs, vsite charge increments, and vsite geometries. The latter is hardest because they have a nonlinear effect on energies. Can now use numpy/pyro for these optimizations, or pytorch. So it should be trivial to train vsites against ESP or electric field. This depends on TG’s parameter vectorization PR in the toolkit, so I’m doing this off of that branch.

      • JW – Is there a risk of duplicating work here? There’s the PR, a possible external implementaiton, and interchange.

      • SB – I have an implementation that MT could look at to guide an Interchange implementation. vsites.py in

    • MT – Is vsite vdW fitting inside of the scope of recharge?

      • SB – No, right now we’re just doing electrostatics/charge increments. The optimization targets are all against ESP and electric field.

Sage release next steps



Action items

Decisions