2021-02-01 Core Devs update

Date

Feb 1, 2021

Participants

  • @Jeffrey Wagner

  • @Simon Boothroyd

  • @Connor Davel

  • @David Dotson

  • Mike Henry

  • @Matt Thompson

Goals

  •  

Discussion topics

Item

Notes

Item

Notes

Updates

  • SB –

    • Changed how evaluator categorizes compounds in physical property calculations. Usually categorizes them based on chemical functional groups. Previously each compound could only belong to one.

    • Also added provenance info to record the host machine/dependency versions.

  • MT –

    • Split openff-toolkit into openff-toolkit-base. Thanks for help and design from SB and JRG. Added test into build process to ensure that nothing’s broken. Now base is everything except AmberTools and RDKit. OpenMM is still in base, but it’s really tangled up all throughout codebase. This should un-screw-up things with production Psi4.

    • Made openff-units package. Mostly copied from evaluator. . This is still preliminary and I wouldn’t recommend putting it into production.

      • Also want to find a more stable solution for pinned unit registry (files copied from Pint)

        • SB – Quick note – These aren’t from Pint verbatim – I deleted lots of contents that aren’t relevant to us. Also, these are pinned to NIST CODATA 2018 (what OpenMM uses as well)

    • Some discussion about OpenMM python namespace and units.

    • Working on generic Topology representation – Current representation is cheminformatics-based and uses that for deduplication. Worked on making a copy that’s based on atom types. after some work, I think it’d be easier to COPY the current implementation and turn off cheminformatics features, keeping the TopologyMolecule/ReferenceMolecule convention around.

      • SB – Would like to talk about this a bit in the future.

      • JW – Please include me in meeting as well.

    • Discussions with VU about goals/future of collaboration. Over the past months we’ve worked on understanding the needs on both sides of the project, and aligning interests. PIs want the System to be a community-governed project that inherits users/features of GMSO and OpenFF System, and is extensible to other system formats as well.

  • CD --

    • No new overall results. Packaging molecule equivalence testing infrastructure for an upcoming show-and-tell

    • Chasing down two new problems.

      • OE adds too many protons to nitrogens. Lots of problems are caused by ambiguities in file representations.

      • Differences in stereochemistry around sulfoxides – Both toolkits DEFINE stereo, but they frequently/always disagree.

  • MH –

    • Working on infrastructure around containerization of workflows. Big-picture, this is for SAMPL challenge. Currently working on MVP for this. Want people to containerize their prediction workflows, such that molecules go in and predictions come out. Ideally, we could string these together as workflows, so that people can plug-and-play combinations of steps. Working on doing this with a django web app.

    • Hopefully this will make it so that people can get really reproducible predictions – We could just have a docker container with all their dependencies/licenses.

      • JW – Data security is hard, I’d recommend not accpeting licenses, or at least telling people there’s no guarantee of security.

    • JW will add MH to Confluence

    • SB – If you do find the need to make RESTful APIs, I’d recommend using FastAPI

      • MH – Currently doing things pretty simply right now/still in planning phase. But if fastAPI makes development faster I’ll check into this in the implementation phase.

    • JW – Would love to chat about file specifications and molecule representations.

    • DD – Where would metal infrastructure for this be?

      • MH – AWS hosting.

      • DD – Would be happy to chat about this and software design.

      • MH – I’ll post design docs in OpenFF slack/confluence.

  • DD –

    • Kicked off “season 1” of benchmarking, using geometry optimizations. Worked with DH, JH, JW.

    • Many of the partners have completed the “burn in” stage, where they deployed our software on internal resources and ran it on ~300 molecules. Planning onworking directly with partners who are having infrastructure trouble this week

    • Our production release is blocked by a decision on whether geometry benchmarking results should be segmented by molecule features (heavy atom count, rotatable bonds). We need a decision from Gary Tresadern (Janssen/Belgium) on this.

    • PEPCONF was submitted, rolling through about 300 jobs/day.

      • DLC coordinates (as opposed to TRIC) is something like an order-of-magnitude improvement. Removing translation and rotation causes things to converge in like 1/10th the number of steps. Also the reset=True flag changes how the hessian is stored between steps, and we’ve found that resetting it gives a big performance improvement as well.

    • Worked on ANI torsiondrive local execution. Would like to avoid fractal altogether on this pathway (currently using QCFractal snowflake).

  • JW –

    • Worked with DD on benchmarking. Worked mostly on deployment, some software development.

    • Working on some project planning, might be looking to get another core developer on in a contractor-like role. Feel free to send me suggestions for good people you know. We may not have time for a full candidate search.

    • MT – Also, we met with BioSimSpace about OpenFF compatibility.

  • PB –

    • Working on WBO-based parameter interpolation. Looked at how changes in prior values change the quality of fits. Also looking at which molecules show a large deviation in ddE.

      • JW – How long do forcebalance runs take?

      • PB – For 10-11 interpolated parameters, I use 900 torsion targets, and it takes about 1 day on 40 cores. Refitting ALL parameters would take longer.

    • Also looking at calculation of residuals (the QM energy difference minus MM energies without the torsion in questions)

    • Working on genentech dataset submission.

  •  





Action items

Decisions