2021-02-22 Developers Coffee Meeting notes

Date

Feb 22, 2021

Participants

  • @Jeffrey Wagner

  • @Matt Thompson

  • @Pavan Behara

  • @Simon Boothroyd

  • @David Dotson

Discussion topics

Item

Notes

Item

Notes

Updates

  • SB

    • Lots of science work. Thinking about WBO fitting. Considering whether it’s ready for sage/whether Sage should be delayed

      • SB – Main issue we imagine is happening with AmberTools is conformer collapse – If positive and negative groups are nearby in a conformer, then we could get large conf charge/proton transfer. Also, if we have closely-spaced conformers, we could get conformers minimizing the same minimum.

        • For electrostatic collapse, could use heuristics to determine when to use restraints (maybe check output of unconstrained optimization and see if coordinates change a lot).

      • JW – Possible ways forward could be to

        • Expand use of ELF conformers to AmberTools

        • Use restraints in AmberTools

          • SB – I don’t want to do the restraint fitting – This feels a lot like reverse engineering. Would prefer to do RMS checks for conformer collapse/proton transfer.

      • JW – Should we aim to get distinct conformers in ELF10? It seems like the current scheme could end up giving us a bunch of very similar conformers

        • SB – This is possible. Bayly suggested using Omega dense conformer sampling.

    • Put together a package for interactive plotting of fitting/mol structure visualization. Used plotly.

    • Working on evaluator issues – Getting nan in gradient calcs, some DASK based issues.

    • JW – I’ll look into doing AmberTools restraints, and ask someone like Jessica to regenerate plot with those, to see if its a night-and-day difference.

  • MT

    • Continued working on getting exporters to match energies. Currently iterating on a process of 1) get molecule 2) parameterize to System 3) export to other format (OpenMM/GMX). OMM has been easy, Gromacs is hard.

      • Currently working on cyclohexane+ethanol exports. This is matching on 1e-6 for valence terms (for all engines!), and several orders of magnitude higher for nonbonded.

      • Some issues with GMX coordinates only being to 3 digits past the decimal. So, to compare, I have to round off at the same place in OpenMM

      • Currently haven’t checked

        • Constraints

        • Impropers

        • Nonperiodic torsions

      • DD – Constraints in GMX should be possible – Can make a separate constraints file that gets fed in. It may involve defining groups of atoms and enabling constraints on them by referencing those groups.

      • MT – Currently planning to explicitly not support this now, and warn user that constraints aren’t being written out, and produce the constraint file later down on the roadmap. System object currently handles constraints like the toolkit, where they’re done through a separate handler.

    • MT – Also thinking about pathway to having a curated set of systems/energy tests for different FFs/chemistries.

      • JW – This was a big point of discussion at a previous interoperability meeting. It may be good to connect up with Chodera about this – We may be able to get a reference set of molecules+energies that we can validate against.

      • MT – I expect that there may be friction where we realize that many things that people are doing with typing/parameterization are actually wrong/underdefined.

      • JW – Could compile our own set of SMIRNOFF and GMX reference energies, and let GMX developers correct us if we’re wrong.

    • MT – Swails is getting involved in System work, not clear in what capacity.

    • MT – I need to avoid scope creep on the fitting side – While we might eventually replace forcebalance, I don’t consider the replacement as a part of the System objectives. So I’ll make API points that a ML process would need to get the data that it wants, but I won’t implement the ML stuff under the hood.

      • JW – Agree. ForceBalance replacement (especially in terms of eg torsiondrive fitting) is going to be insanely complicated.

  • DD –

    • QCA: Added new submissions from users. Compute is saturated.

    • Currently focused on QCA submission infrastructure improvements. Looking to automate tasks and cache conda data.

    • Next,

      • I want to work on implementing standards v3.

      • Continue debugging PEPCONF.

        • PB found some settings that can fix SCF convergence errors, may implement as 2-pass approach, where if a job fails initially, it will get resubmitted with different settings.

        • TG found a case of psi4 segfaulting. Working on reproducing/diagnosing this. Would like to provide a reproducing case to psi4 devs. If we can figure out details of this failure, we could also improve error reporting (currently these are reported as success)

    • Partner benchmarks: 7/10 partners are either preparing or running production datasets.

      • Working towards bugfix release of benchmark package – Will include mirror image deduplication.

      • D Hahn is working on OLPS3e energy computations. I’ll be code reviewer for that, and Swope will be scientific/user side reviewer.

    • Preparing partner datasets for submission. Would like these to be running by early March.

  • JW

    • PR reviews

      • 0.9.1 requirements (MT made a great project board for this)

      • AmberTools ELF10 PRs from Simon

        • Hit a conceptual roadblock when merging a PR that implements a new keyword for fractional bond order calculation (am1-wiberg-elf10), because we know that the workflow triggered by this keyword will change in the future. This means that, if we use this keyword in a force field, the numbers that come out will change with different version of the toolkit.

          • DD – Even if we make these implementation-level changes, our methods ARE reproducible, people can check out old versions of the package to get reproducing implementations

          • PB – Not clear that ATTKW needs to “catch up” to OE defaults. It could be the other way around.

          • MT – As a user, I wouldn’t care too much about this. I’d trust the developers that each version is an improvement. How frequently do we expect users to update their conda environments? I’ve found that many users are working with very old toolkit versions.

    • Onboarding Josh Mitchell (documentation writer)

    • Worked with Slochower on toolkit usage and putting together workflows.

    • Andrew Dalke – Cheminformatics and python wizard, we may bring him on as a contractor

    • Ivan Pulido – Will be a software scientist in Chodera lab soon, bringing him on to OpenFF for a few months.

  • PB –

    • Lots of WBO work, different potential fits. Studying whether fits/initial proof of concept work can extend to non-congeneric datasets of larger molecules

    • Studying how to use TG’s fingerprinting method to cluster molecules similar to DBSCAN.

    • PEPCONF debugging – My current recommendation is to use second-order methods when we get SCF convergence failures. Seems like Swope is encountering the same problems.

      • DD – Agree that this is what Swope is seeing on his machines. The current performance is acceptable, but it will be great if we can get an automated solution implemented.

Sprint planning

Return on the hour

Action items

Decisions