2020-07-20 Developers Meeting notes

Date

Jul 20, 2020

Participants

@Jeffrey Wagner
@Matt Thompson
@David Dotson
@Simon Boothroyd

Discussion topics

Notes

Notes

0.7.1 release – MT will add deepcopy to are_isomorphic, ping Jeff if we hit trouble.

MT – Not a huge amount of progress on System. Mostly devops/toolkit fires. Cleaning up Toolkit fires. Cleaning up toolkit PRs. Lots of conda work – Taking a lot of time, but learning a lot.
DD – Mostly worked on QCA submission lifecycle. Worked with JH on automation and tracking on GH. Will put into production soon. First to submit will be Rowley biaryl set. Updating psi4 used in builds since current prod build is a year old. Focused on helping JS get pAPRika branch merged.
SB – Made automated pipeline for ES potentials. Paralellized with multiprocessing. Should be compatible with eventual QCF ESP calculations. Interface is pretty general so it should be able to handle pulling explicit ESPs or e- densities.
- Recharge can optimize BCCs using matrix formulation, runs quite efficiently. This is performant enough that we can plug it into a bayesian optimizer. Currently done small runs using lightweight libs, could eventually using Pyro.
- Concerned in the long run about PyTorch’s parallelization schemes, and compatibility with our work. PyTorch assumes homogenous computation at a high level and splits up dataset to different processes early on.
  - This may be trouble if our optimizations are highly interdependent, since then there will be information exchange needed between lots of processes/between distant parts of dataset.
  - MT – I’m also concerned about limitations on cross-talk as we think about different/experimental fitting strategies.
  - We plan for a lot of our future scaling to be smoothly handled by PyTorch’s performant underlying libraries, but it seems like the questions that we’re approaching may be fundamentally incompatible with their performance/parallelization schemes.
  - MT – Does it seem like other ML libraries may handle our problems better?
    - Hard to say. We need to pin down exactly what we mean when we say “parallelization”. How much are we splitting up the work? Just by system/simulation? By force term?
JW – AMBER FF porting. OE2020 fixes. Speccing CLI tool requests.
- Should make it possible for CLI tools to take molecule input as STDIN and output as STDOUT. This would be triggered by -i - , where - means “STDIN”.
- Standard CLI infrastructure? argparse vs. click?
  - SB – Shared library of keywords for I/O opts?
- SB – Should study GROMACS tools at well as AmberTools.
  - DD – Do we want an openforcefield prefix on all our CLI tools?
  - eg openff-gen-confs.py or openff gen_confs
    - Pro: easy to access
    - Con: Hard to find source
- Should all CLI tools be wrappers around pre-existing functions?
  - Pro: Clear and maintainable
  - Pro: Anything you prototype in the CLI can be implemented (or paralellized!) in Python
  - Con: Hard to find/modify source
  - Con: Behavior is strictly tied to OFFTK release
- Should CLIs be accessible through a conda/setup.py entry point (copied into $miniconda/env/bin)
  - Pro: easy to access
  - Con: Hard to find source
- Make separate openff-cli module/package?
  - Lets us have a different release pace, but also have this under our “guarantee of correctness” umbrella.
  - Could be where utils/structure.py goes to
  - If all CLIs live in openff/cli, and this is a Toolkit-centric repo, then where does the evaluator CLI go?
  - Import loops?
    - Could have a policy of “never import from openff-cli”. CLI can import from other packages, but nothing should ever import it.
  - Testing? Handling missing/optional dependencies?
    - Make sure we do lots of lazy-loading so that we don’t have hard dependencies.
    - This would let us modularize readers/writers from underlying functions, and not need to test all permutations of inputs for all methods separately.
    - There are dedicated CLI testing libraries that we could include.